Monitoring and Evaluation (M&E) for Big Data
The world is becoming an increasingly interconnected and complex place. An event in one part of the world can have a large and rapid impact in another. As a result of technology, we are producing digital data in higher volumes, and from increasingly diverse sources. Some of these digital data can be harnessed for public good, for global development or for humanitarian action.
However, as new data analysis techniques rapidly emerge, the monitoring and evaluation (M&E) tools we currently use are not responsive, interconnected, or adaptive enough to react to this rate of change. A data revolution is needed in the field of M&E, as with many other areas of data analysis.
NEW DATA SOURCES
Digital data comes from diverse sources and is generated in different ways:
In development practice, existing M&E methods use data that is either:
- Continually, actively, routinely collected through public service systems during implementation
- Non-routinely, actively collected at intervals in time e.g. through surveys.
New digital data sources present us with a new type of information – digital signals. This refers to digital traces that are continually generated through both active and passive activities. Like routine collection, data is generated continuously. However digital data differs because it is generated outside of public sector information systems.
New computational techniques, typically using algorithms, enable us to reveal trends, patterns, and correlations in data, and visualize it to turn it into actionable information. Advances in technology also enable us to store greater volumes of data, and process it at faster speeds. Examples of these new approaches include:
- Machine learning: a type of artificial intelligence focused on using computers to develop and implement algorithms that can learn from experience i.e. when they are exposed to new data
- Data mining: an interdisciplinary field (including artificial intelligence, machine learning, statistics, and database systems) that explores large data sets with the aim to discover patterns
- Sensemaking: an interdisciplinary field that focuses on using intelligent systems to find insights in (make sense of) large amounts of information by interpreting them in context.
Much of ‘big data’ practice involves using data that was generated because of another activity or for another purpose. It is secondary data that already exists and therefore we don’t have the opportunity to design the data collection frames, and we therefore can't specify in advance what a big data set must contain or how it is structured.
Jules Berman gives some comparisons between ‘big’ vs. ‘small’ data:
The more complex or large the data, the less likely it is that traditional data processing applications can be used. But in reality the ‘big’ vs. ‘small’ data sources distinction is not clear-cut. Data sources fall somewhere along a continuum between simple structured data to complex unstructured data.
When using data to gain insights for development, the same statistical considerations need to be applied regardless of how the data is defined. Areas that should be considered include: data quality, inference, causality, and bias.
As big data projects use diverse data sources often requiring input from multiple organizations and disciplines (including those within the private sector), extensive collaboration is required to pull together information from multiple sources. A legal and regulatory environment that supports the sharing of data is also required. As new ‘big’ data sources and analytic methods emerge, their applications for M&E also require experimentation, and iteration.
Decision-makers gain an understanding of a situation through interpreting and extrapolating the pieces of information they have. This information is incomplete, ‘asymmetric’ (decision-makers and beneficiaries have different information to each other) and comes from a combination of diverse sources.
M&E aims to:
- Provide information that closes this knowledge gap with the goal of targeting available resources to the areas of greatest need for beneficiaries
- Improve accountability
Big data sources, technologies and innovative approaches have the potential to provide complementary, actionable information for decision-making in the development sector.
EXPERIENCES & OBSERVATIONS TO DATE
Here at Pulse Lab Jakarta, we conduct proof-of-concept Big Data for Development projects, and explore new approaches to evaluation and M&E. Some early observations drawn from our experience point to the following:
1. The way that big data is applied in the development sector is likely to differ from the way it is applied in many commercial organizations
In many commercial organizations, big data is applied on a closed, relatively simple feedback loop. For example, when looking at shoppers’ transactions, there is a direct connection between product design, sales and revenue: a company has the opportunity to experiment with product specifications which gives rise to positive or negative consumer feedback via increased or decreased revenue.
Development interventions address social problems where the relationship between interventions and target beneficiaries is usually more complex. It is likely that any feedback loop between development interventions and beneficiaries will not be so well defined. In addition, many of the variables that are relevant to a development intervention will not be captured by digital data, and where data does exist, it is unlikely that the data ecosystem will be continuous - it will most likely be fragmented in different formats and different systems that differ both by organization and the units within them. These silos of information don’t just exist on one horizontal ‘layer’ e.g. different units within the local government. They also exist vertically. For example, information from citizens may be held in silos in Civil Society Organizations, but not fed into silos in the local government – and then not linked to silos in the national government.
The types of automated analytics that may be appropriate to gain rapid insights for relatively simple feedback loops in the private sector have some applications in the development sector - an agricultural infrastructure sensor for example, where the feedback loop is clearly defined. However, in many cases automated analytics will be insufficient in isolation and appropriate interpretation of data will be reliant on heavy incorporation of contextual information into the analysis. In these instances it is likely that the use of mixed-methods will be essential to make sense of ‘big data’.
2. Having a strong theoretical basis behind big data research, and using qualitative techniques to incorporate contextual information for data interpretation will be essential to gain actionable insights
If data is analyzed remotely, the likelihood of making incorrect assumptions will be amplified because of the loss of contextual information.
To separate signal from noise, a strong basis in theory is essential - such as having a hypothesis prior to analysis. The incorporation of contextual information for interpretation of results is also crucial. Qualitative, participatory methods (like interviews, focus groups, or direct interpretation of the data itself by stakeholders including the community) should facilitate a better understanding of the data – as opposed to interpreting data from a distance.
Mixed-method approaches are likely to be more robust than one method used independently. Different types of data have different strengths and weaknesses and one type does not replace the other. Information gained from different sources and methods is complementary. If you need data that is a highly representative sample of your population to answer your questions, then you may choose a household survey. If you want secondary data from an additional communication channel to gain insights on the community perception of an intervention, you might go for Twitter. It is not an either/or scenario and each type of data provides an additional piece of information.
Big data alone does not capture the complete picture, but, rather, enables you to see patterns that can be triangulated with other data sources. As well as providing new information, comparing new data sources with existing data sources facilitates better validation of information and can be used to develop stronger data collection systems.
If you are interested in collaborating with us on a big data project, you can get in touch with the team via Twitter @PulseLabJakarta