Can Twitter data be used to track changing food commodity prices?
Aug 11, 2014
Imagine going to your local shop today and finding out that the price of your favourite type of meat has suddenly gone up, or perhaps is on special offer. Would you tweet it to your followers to complain for the price hike or, on the other hand, spread the word about the deal? And imagine replicating scenarios like this hundreds of times across a country. Could these “digital traces” provide us with sufficient signal to estimate food prices on a daily basis? Would this allow the collection of real time statistics that are sufficiently accurate or “good enough” to guide the work of policy makers?
Indonesia, a country which ranks among the top five in the world for total number of Twitter users and is spread over more than 18,000 islands (a real challenge for data collection!) is in many ways the ideal place for testing this hypothesis and its potential practical implications.
Today we are happy to announce the launch of a project microsite that summarises the results of a Pulse Lab Jakarta research project done in conjunction with Bappenas and the World Food Program in Indonesia. The research focused on three target commodities: beef, chicken and onion. We mined Twitter for mentions of prices of these commodities and developed a model to analyse them based on the assumption that one can infer today’s price based on yesterday’s price. On the site, you will be able to compare the prices as inferred from our model with the actual government data. (For more information on this project, please visit the research project page. A detailed methods paper on the project will be available soon.)
So, can you actually nowcast food prices based on Twitter? The short answer, based on our analysis, is yes – but with some important caveats.
While the prices of both beef and chicken are relatively stable, changes were consistently discussed on Twitter, with spikes in activity corresponding to increases in price. For these commodities, the model data was generally predictive of official pricing in the long term.
The price for onions, on the other hand, fluctuated considerably during the course of the project. Two price increases in April and August 2013 corresponded with clear increases in Twitter activity and the model data had a strong correlation with the official data. However, when the price decreased, there tended to be fewer tweets, and the model data remained the same while official prices dropped. When prices stabilized, there was almost no response on Twitter. It appears, then, that people were reactive on social media to noticeable increases in price but not to decreases or longer-term price stability.
Overall, we found that Indonesians do tweet about food prices in real time, creating data that could be used as a proxy/complement for official food statistics and perhaps provide early warning for unexpected spikes. The implications in terms of reducing costs compared to traditional data collection could also be significant.
But of course, these results need to be taken with caution and, as with any big data project, further investigation and fine-tuning will be needed to validate the results over time. Here are some questions for follow up research that we are considering:
- Could the model be applied to other commodities? Our preliminary investigation on chilly prices yielded some intriguing results (more details to come about this in a follow up blog post)
- What about regional price variation? Assuming we could geo-locate a sufficient amount of tweets, could we pick up sufficient signal to investigate regional variations in prices?
- How does this methodology compare with other approaches to enhancing the price collection data process through big data? (most notably, web crawling as in the MIT Billion prices project?)
As we continue to collect and analyze our data, we are open to suggestions for improvement: Have you conducted similar research in other countries? Are there ways to improve our model? Could you see the same approach replicated in other emerging markets?
We look forward to your comments.