The New Data Landscape
Global Pulse is predicated on the understanding that today “new data” is being generated as a by-product of people’s activities at a rate that is unprecedented in human history. That deluge of passively produced data may hold insights about how, for better or worse, people’s lives are impacted by shocks like volatility in food and fuel prices.
Part of Global Pulse’s approach is learning how to distill information from data so we can identify insights that can be useful in program planning and, in the long run, can contribute to shaping policy. This blog is meant to serve as an introduction to some of our current thinking. Blogs detailing particular applications of data types will follow, examining both specific projects and key challenges.
While ‘new data’ has in many ways become our lingua franca, developing real time information streams is also very much about new ways of collecting traditional data, and of facilitating the sharing of existing data and knowledge across sectors and institutions. With that caveat in mind, here are some descriptions of what we call new data:
Data Exhaust (Services as Sensors)
As communities in the developing world increasingly adopt and use new technologies – particularly mobiles phones and services provided over mobile - they generate ambient data as by-products of their everyday activities (or 'data exhaust'). This data exhaust may be key to detecting early signals of change including emerging vulnerability or incipient harm. Even when individuals do not have direct access to mobile phones or other technologies, they may still be passively emitting information as they go about their daily lives (e.g, when they make purchases, even at informal markets; when they access basic health care; or when they interact with local community leaders). For more on this, see the blog Digital Smoke Signals.
As a subset of this type of ambient data, UN and other development organizations around the world are regularly collecting programme data about the communities in which they work, such as:
- Operational data streams that facilitate logistical elements critical to the way programmes function (e.g. stock levels, school attendance);
- Records and/or logs of how people access services, what services they are accessing, and changes in their use of services;
- Needs assessments, rapid surveys, or focus group discussions used to design or implement programmes;
- Evaluation data about the effectiveness of programmes, particularly where it points to changing behaviors.
Another source of data exhaust is information-seeking behavior, which can be used to infer people’s needs, desires, or intentions. This includes Internet searches, telephone hotlines such as Question Boxin Uganda and India; and government service lines such as 311, or other types of information hotlines.
Online Information: News, Social Media, e-commerce & more
Digital content is growing exponentially and the ability to mine this content provides a real-time data collection opportunity. Much of the content of press agencies and other traditional media outlets is publicly available online and in databases, providing the opportunity to scope global perspectives instantaneously. Furthermore, social media, discussion forums or e-commerce sites can give real-time snapshots of what a community is experiencing. Data mining methodologies allow for identifying trending topics (e.g., keyword searches); finding correlations between traditional and new economic indicators (e.g., national statistics vs. online price data); or uncover qualitative information about a population’s preoccupations through social media (e.g.,Twitter).
Information is increasingly being provided directly by citizens, through a variety of mediums and crowdsourcing techniques. Most famously, the Ushahidi map platform which was developed in Kenya during the 2007 post-election unrest for citizens to report incidences of violence, has since been used for emergency response in Haiti, to monitor elections, to report on incidences of sexual harassment in Egypt, and more. At the same time, increasingly, the development community is working with crowd-sourced technologies to create trusted source communities in rural areas, for example among health workers or local government officials, who can be tapped when direct inputs are needed.
Further efforts in crowdsourcing methodologies, including validation and verification, could provide information streams useful for development planning in the future. In the meantime, crowdsourcing can play an important role in verification of other types of data. For example, where early warnings are being raised through other sources of information, the trusted crowd may be able to play an important role in confirming or refuting hypotheses.
Another experimental type of new data focuses on the impact of changes in human behavior on the physical environment. These can be assessed through a variety of existing methodologies, from real time monitoring of water quality with dedicated sensors to satellite imagery.
Satellite imagery is a particularly relevant example which has long been used to track long term dynamics (such as climate change), and hyper-short term dynamics (such as disaster response). However, few attempts have been made to understand the underlying socioeconomic structure that can be gleaned from images, and certainly it is a compelling possibility. In other words, finding image "fingerprints" obtained by sensors. Specifically, we are interested in measuring changes in a weekly/monthly basis that can shed light on human behaviors, including coping mechanisms. For instance, satellite images of rooftops might reveal the use of new building materials, which might be a sign of urban development. Additionally, automated pattern recognition techniques could help identify migratory flows or fluctuations in the number of livestock in a marketplace.
Of course, it is not enough just to have this data. Perhaps the most important goal is developing an understanding of how it can be useful. Global Pulse is developing methodologies whereby decision-makers will be able integrate these data streams into their existing monitoring efforts to detect anomalies, form hypotheses, gather evidence, verify the underlying causes and take action to improve the well-being of the populations they serve, especially in addressing the impact of global shocks and crises.
It cannot be emphasized enough that “new data” does not stand alone. In order to be used appropriately, it will have to be integrated with traditional sources of data, such as official statistics, rapid impact assessments, etc. The exact combination of data streams that will be useful to each stage of the monitoring process will greatly depend on the particular issue at hand.
The relevant indicators that point to populations under stress will need to be based on local context; the same is true of the relevant constellation of data exhaust. As we experiment, it should become clearer what role each type of information can play. In addition, the availability of data and the types of signals the data yields will differ from country to country. It is to be expected, for example, that countries with high mobile phone and Internet penetration rates will generate more digital exhaust that comes directly from citizens; countries with large aid communities will produce more programme exhaust than less aid-dependent countries; and countries with a vibrant local business environment will have greater service offerings. Variations between age groups, income brackets, gender and geographic location should also influence the exhaust.
Ideas? We welcome them! Send us your thoughts.