10 big data science challenges facing humanitarian organizations
This article was originally published in the UN's refugee agency UNHCR Innovation website: http://innovation.unhcr.org/10-big-data-science-challenges-facing-humani...
Big data refers to the astonishing amount of information that is created as a by-product of the growing digitization of our lives – our use of mobile phones, social networks, search engines, online payment methods, apps, and so on.
What is exciting for the development and humanitarian community is that, if we can extract patterns from these datasets, we could have a whole range of real-time information about people that previously would only have been available with months of planning and at high costs. And in some cases, like in countries affected by conflict, no data would have been available. We can look at big data not as the new oil but as the new green energy. It can be fully recycled and used for different purposes and to solve different problems.
We are already starting to see examples of how big data can help support both sustainable development and humanitarian action. But while innovative projects are showing the potential of big data, we have to remember that there are still challenges that we need to overcome.
1. Identifying the right problems where new data sources can help.
Big data analysis by itself is not a solution but a tool to solve an existent problem. With many fascinating big data sources available, innovators in humanitarian organizations can get carried away by the data sources they have access to, the use of which may add little or no value to the organization. Articulating the problem in a very precise way clearly maximizes potential project returns. You will have to iterate to refine your problem statement, as many times we do not know what we do not know. Looking for inspiration in other projects and building on lessons learned can help. Here are 20 examples of data innovation projectsfrom UN Global Pulse.
2. Data on its own will not yield insights.
Data access is just part of the journey. To distil insights from raw data you need clear methodologies, supportive research, and a good team. And you also need allies and partners who can work with you and help develop your data innovation project. There is no silver bullet, and recent hype oversimplifies what can and cannot be done with big data. These facilitation tools can help you design a data innovation project.
3. Finding data translators and data therapists.
Humanitarians and data people don’t usually speak the same language: they do not share a common vocabulary or context, and often cannot align their goals. Humanitarian organizations need hybrid profiles, i.e. data translators who are able to understand and interpret both sides of the discussion. In many cases, what you might think is a big data problem already has an existing and tested solution – all you need is some data therapy sessions. This is the composition of multidisciplinary teams within UN Global Pulse Labs.
4. Validating big data architectures with small data foundations.
Make sure you understand the relation between your big data sources and the real world and how things are typically done. While data cleaning and preparation might be an art, data analysis is a science – and as such it requires robust and tested methodologies. One of the first things you need to know is how you are going to validate and evaluate your proposed methodology. This research predicts poverty and wealth from mobile phone metadata and validates the results with mobile phone surveys and demographic and health survey (DHS).
5. Reverse engineering representativeness from big data sources.
Do not expect that your big data source has a perfect demographic sampling. The interesting thing is that many times, instead of having a 0.1% of the population making a perfect statistical sampling covering all segments, you will have a data source that covers 30% of the population but with particular demographic characteristics. Assume it and benefit from it. You can see examples of demographic sampling in this research that proposes a proxy for unemployment statistics in Spain based on social media fingerprints and another study on the penetration of mobile phones and phone usage patterns in Kenya.
6. The black box does not have a heart (yet).
Any data project must respect privacy principles. Before undertaking any project, you need to conduct a privacy and risk impact assessment to make sure that you are aware of the potential risks the accessing or use of certain data might create for individuals and groups. The risks are not only related to data access – the methods of analysis must also be considered carefully. Some algorithms might also work as biased black boxes. We will need to figure out how to embed human rights principles in future AI (artificial intelligence) systems. These are the current UN Global Pulse data privacy principles and a recent report from the White House on the future of AI.
7. The divide is coming.
All private sector companies are creating structures that allow them to make data-driven decisions about their business. The big data science field is still in its early days, so the entry barrier is relatively low. However, the growth of the field is exponential, so if humanitarian organizations wait too long to put together their data savvy units, the field might become too expensive. There are many possible structures an organization can use, from a very small team of data translators and outsourced data operations, to a centralized data science team, to distributed data literate units across the organization. Data Science Africa is a unique forum, cultivating a community of experts and students applying data science to development and humanitarian challenges.
8. Finding the right moment to introduce data innovations during emergencies.
Before testing a big data innovation in an ongoing emergency, you ideally need to have conducted a proof-of-concept and a prototype, based on a retrospective realistic scenario or simulation. From there, you need to find the right balance to introduce the new approach into existing workflows and operations, respecting the unique strains on staff and responders during an emergency. Co-creation of prototypes with users on the ground is key to generating useful tools. The recent UNHCR Beyond Technology 2015 report provides multiple examples of innovation within emergencies.
9. Data does not always tell you what you want to hear.
Many times the noise is bigger than the signal and the data doesn’t reveal anything meaningful. Other times, data can reveal too much: something unexpected or additional information that can hurt the people you are trying to help, your stakeholders or even your own team. Trust in data innovation is not gained overnight.
10. The impact of new big data innovations has not yet been measured. So what?
The key question is: what decisions can be made based on new data insights? Measuring the impact of those data-driven decisions will help make the business case for big data innovation in the development and humanitarian sectors. Once humanitarian practitioners understand the ROI of big data based on impact, we can start measuring the actual costs (financial and human) of not using these new sources of data, and streamline the scaling and adoption mechanisms.
What are the major challenges that your organization is facing to leverage the big data revolution? Can you relate to any of the above? Tell us in the comments below.