Alternative Use of Traditional Data in Times of COVID-19

7 min read

Now nearing the end of September 2020, Indonesia has reported almost 275,000 COVID-19 cases since the first confirmed case in early March. Just in terms of the number of detected cases, it is second highest in South-East Asia, after the Philippines. With cases yet again on the rise, parts of Indonesia have brought back the soft-lockdown more commonly known as PSBB, an acronym for Pembatasan Sosial Berskala Besar (Large Scale Social Restrictions). At this stage finding the balance between decreasing disease incidence whilst also limiting the extent of the economic fallout is understandably a tough task, especially with poverty incidence on the rise. Public policy professionals would call this a “wicked problem”.

At Pulse Lab Jakarta, we’ve been working with our government counterparts and others including UN agencies on a variety of fronts in relation to COVID-19. For the last several months we have been working with Jabar Digital Service (JDS) of the West Java provincial government in collaboration also with our main government counterpart, the Ministry of National Development Planning (Bappenas). JDS was established to help narrow the digital divide within the province and increase uptake in the use of technology in community life as well as government. One of its key mandates is to improve the efficiency and accuracy of data and technology-based policy making in West Java.

West Java is one of the provinces that had been implementing PSBB originally given its large number of cases, as well as its close proximity to the epicentre in the capital city of Jakarta. With a view to developing a more localised, targeted and dynamic approach towards imposing and lifting PSBB, our team joined forced with the West Java government and Bappenas to develop insights. These would support the West Java government’s decision making in opening up pockets of areas to economic and social activity whenever the conditions merited it.

The pandemic has brought data innovation to the fore, and much of the demand has been to leverage alternative data of high spatial and temporal resolution that cannot be met by infrequent traditional official statistics. As we have talked about before, there is much that still needs to be done to make the use of such alternative data the norm rather than the exception. But part of data innovation is also about finding new uses for official data that was collected for other purposes. That was the case for part of our work for JDS, in this case to identify areas based on their transmission risk and transmission potential for the spread of the disease.

Re-Using Existing Administration Data

Several factors contribute to the spread of COVID-19, such as the movement of people. However mobility by itself (coupled with disease incidence data) is not enough, and two areas with similar mobility and similar disease incidence might end up with varying outcomes based on other factors such as population density. To have a better understanding of the transmission potential of each area, there is a need for data that has information on transmission factors, preferably at the smallest granular level to support the localised intervention mechanism.

After some research, we identified the Village Potential census (PODES) as a good candidate to understand structural factors that could affect transmission of the disease. PODES was last conducted in 2018 and was the best government dataset we had access to, both in terms of content and spatio-temporal scales. This census is typically used to inform policy makers on the development of the villages across the country. The iteration in 2018 was also particularly intended for the government to verify the outcomes of village fund implementation that started in 2014.

We were able to access this data via Bappenas and following several conversations with JDS (and also the subject experts that they were able to convene) as well as the team at the Data and Information Centre within Bappenas, we came up with the idea to identify areas in West Java where micro-scale social restrictions can be implemented. These micro-scale social restrictions are known as PSBM, an acronym for Pembatasan Sosial Berskala Mikro.

There are clear advantages of (re)using an existing, traditional administrative data set such as PODES to inform policy decisions. First and foremost, it was the most granular government dataset that we had access to which was also relevant for the insights we wanted to develop. Second, the Government already has a high degree of confidence in data that they themselves collected (via a well defined and replicable methodology) and whose provenance is easily established. This means there is less effort required on our side to both establish the reliability of the data, as well as to convince the Government to utilise the data. Third, it adds value to the existing administrative data infrastructure to support the quality of future administrative data collection. Fourth, the data exists in a structured format and therefore is ready to use almost immediately without extensive pre-processing.

For this work we derived two relevant metrics from PODES: (i) transmission potential index and (ii) transmission risk. The transmission potential index is the baseline measure representing the possible capacity of coronavirus transmission in each village before hitting the first case. To establish the potential index, the Lab consulted with epidemiologists and determined the baseline index consists of the following factors where the variables are also available in PODES: transportation, slum areas, meeting points and sanitation. The first three factors have a positive sign, which indicates that a higher score from each factor accelerates the transmission. Meanwhile, sanitation has a negative sign in which a higher score means a transmission delay.

Integrating PODES with Facebook Population Density Map

One crucial data point was not available in the PODES data — population density. To calculate the population density score, one would typically rely on the population census. In terms of spatial granularity to generate population density for each village, high spatial resolution of those dataset enables aggregation of population density to the village level, which is on par with PODES. Unfortunately, like most countries the population census in Indonesia is conducted every 10 years and the most recent available data is from 2010 (the 2020 census is still underway). Therefore, we needed to find more recent population data, but with similar or more spatial resolution than what is available in PODES.

To come up with the population density score, the Lab decided to use the Facebook Population Density Map (FPDM) data accessed through the Humanitarian Data Exchange (HDX) as part of Facebook’s Data For Good Programme. FPDM data was last updated in 2018, similar to PODES and it covers the entire Indonesia with 30-meter square resolution. To integrate this data with the PODES data, the Lab reduced the resolution to the village level. To test its validity, our team used population figures from the 2018 SUSENAS (national socioeconomic survey) data, which is another administrative data collected at the district or city level. The result was quite robust, with a correlation coefficient of 0.98 (which indicates a strong correlation).

Mapping total population density estimates at 30m resolution in West Java.

After calculating the transmission potential, the next step was to assess the metric that compounds transmission potential and cases data — the transmission risk. Transmission risk related to COVID-19 analysis is basically an ex-post measure, where villages without cases basically have zero risks regardless of their baseline score and villages with cases have positive risks associated with the number of cases and the baseline prospect. Therefore, the nature of transmission risk is dynamic as it follows the number of cases in villages. The public version of the dashboard which combines additional data sources can be accessed here.

Insights, limitations and challenges

The insights from both metrics could be used to inform further policy interventions throughout West Java. When faced with crucial decisions on the where, when, and how in relation to restarting economic activities or reopening schools for example, the insights we co-developed can help in part to inform policy makers’ decision making.

The static spatial and temporal measures we constructed from PODES, though not being too out-dated, need to be coupled with more up to date dynamic data on a host of important factors, such as mobility and case incidence data when informing decisions that need to be undertaken. The eventual prototype tool we built incorporated these additional data as well, but what the initial phase of our work with them has shown is that when we seek to leverage data innovation for development it’s not just about finding new uses for alternative data, but also finding new ways to leverage existing official statistics that have been traditionally collected.

Did you enjoy this blog post? Share it with your networks!

News, thoughts and ideas about big data and AI, data privacy and ethics from across the Pulse Lab Network. Read more on the blog.

Pulse Lab Kampala

Dialogue is the Key- Shaping AI for Africa

At UNGP, we believe in dialogue. This is why we participated in the Conference on the State of Artificial Intelligence in Africa (COSAA) held in March 2023. The conference was

Scroll to Top