Pouring Data Science Into Real-World Scenarios to Predict the Spread of COVID-19

9 min read
Share on facebook
Share on twitter
Share on linkedin
Share on email

This text was originally published by UNHCR Innovation Service. Main image by UNHCR Innovation Service.

A collaborative project at a settlement for forcibly displaced people in Bangladesh is not only helping UNHCR anticipate health and safety interventions. It has created data models that could have widespread applications.

Since the beginning, trying to forecast the spread of COVID-19 has been much like trying to keep a wave on the beach. Not only is it a new virus, but it’s often been elusive and slippery as scientists have raced to understand how it spreads — and how to stop it.

Perhaps nowhere is the challenge more complicated than in refugee settlements, with refugees and other people of concern living in small shelters where multiple families usually live together, making social distancing nearly impossible. What’s more, despite the best efforts of humanitarian organizations including the UN Refugee Agency (UNHCR), the reality is that quarantining is difficult, if not impossible, as people must leave their shelters to get food and water or use the toilet.

UN Global Pulse (UNGP), the UN Secretary-General’s initiative to use big data and emerging technologies for development, humanitarian action, and more, joined in efforts to predict the spread of disease using its expertise. UNGP studied ways to use data and epidemic modeling to predict how COVID-19 might spread in a settlement in Cox’s Bazar, Bangladesh, working with teams in the settlement, including public health officials, to generate insights that were used to create data models. The data models resulted in predictions that could then be used to help UNHCR personnel in the Cox’s Bazar settlement and beyond make more informed decisions.

It also offered a chance for interagency as well as external collaboration. The team included UNHCR’s Innovation Service, UNGP, UNHCR Cox’s Bazar settlement public health and information management teams, World Health Organization (WHO) public health teams, the Office of the Coordinator of Humanitarian Affairs (OCHA), as well as the IBM/MIT Watson Artificial Intelligence (AI) Lab — which has specific expertise in data visualization — and Durham University. This was reported in a previous story that detailed the modeling process.

In fact, some of the data modeling was based on a project Joseph Aylett-Bullock, Project Lead and Data Scientist and Researcher at UNGP, had been working on with academic researchers to predict the spread of COVID-19 in the UK. In March 2020, UNGP was already trying to map all the different applications of AI that were being used in the effort to tackle COVID-19. That inspired UNGP to consider if it could model a settlement for refugees or internally displaced people (IDP), even though it wouldn’t be using AI.

“These places are almost certainly not being served by epidemic modeling efforts in the same way as the Global North is and, particularly important, are areas where disease can rapidly spread,” Aylett-Bullock says. “So UNHCR’s Innovation Service put us in contact with people in the Cox’s Bazar settlement, because we felt this would be something we could partner on and would be useful to them.”

A deep pool for predicting trends

In some ways, a settlement for forcibly displaced people provides a better opportunity to conduct the kind of scenario-based modeling involved in this project. Although it falls under the category of predictive modeling — literally trying to predict what might happen next — there’s a general assumption that predictive modeling is all about forecasting numbers. But scenario-based simulation modeling is different. Instead, it predicts general trends that might emerge if specific interventions are taken, such as implementing mask-wearing. Turning numbers into predicted trends provides guidance that settlement personnel can use to make decisions about the steps necessary to keep people safe. Once a step such as mask-wearing is implemented, new data can then be used to adjust the potential outcomes accordingly, because more data makes predictive models more accurate.

The project’s collaboration made use of aggregated and anonymized data from individuals depicting certain behavior patterns, via KoBo surveys, a tool for collecting humanitarian data. In addition, it used data gathered by the UNHCR Microdata Library, which provides powerful insights since it contains data collected at the individual or household level directly by UNHCR or indirectly through its partners, but supported in some way by UNHCR. The rest of the data that was used — including public health and education datasets — are available through different portals such as UNHCR’s open-data portalEducation Cluster portalRelief Web, and the Humanitarian Data Exchange (HDX).

“We keep trying to push the fact that people don’t actually need to be within the UN system to have all these partnerships — it’s open-source and done to a very high standard,” Aylett-Bullock says.

Humanitarian and scientific expertise flow together

The UNGP team took the data provided by the UNHCR Microdata Library and the other sources mentioned above and modeled different protective measures to imagine specific scenarios and determine the effect those measures would have on the spread of COVID-19. This included scenarios such as how a quarantine might be enforced, in terms of home care of those with COVID-19, the impact of mask-wearing, and the ramifications of school reopenings. Staff working in the Cox’s Bazar settlement were integral to the collaboration, because they knew what information they needed most, particularly behavioral insights from people of concern’s movements within the settlement.

For the Innovation Service, teamwork with UNHCR and WHO colleagues as well as humanitarian sector leads based in the Cox’s Bazar settlement was vital to the project’s success.

Both WHO and UNHCR colleagues on the ground could provide solid background information on the settlement. Unfortunately, there were previous outbreaks like diphtheria so there is historical epidemiological information that could be used to see how transmission of different diseases worked in the past. This provided additional epidemiological context when trying to model a new disease, because the team could look at the previous interventions that were successful in limiting disease spread.

Two-way translation of science and human needs fosters smooth sailing

The collaborative nature of the project dashes the misperception that academics and data scientists only work in so-called ivory towers — whether in universities or UN offices — away from the realities of the world. What’s more, the work conducted across these teams is a positive example that interagency creative collaboration at the UN as well as with academia, is possible.

Rebeca Moreno Jimenez, an Innovation Officer and Lead Data Scientist at UNHCR’s Innovation Service, oversees all Innovation Service projects related to AI and big data, to ensure the project is appropriate to data protection, data responsibility, ethics and human rights-based approaches. But she has other roles, too, and in this project, her involvement focused more on mentoring innovation methodologies — and, perhaps most important, serving as a humanitarian/data science translator between UNHCR colleagues working in the Cox’s Bazar settlement and the data scientists.

“In this case, a humanitarian challenge gives you a data problem: What do we need to do in terms of emergency preparedness and humanitarian response in order to avoid the spread of COVID-19 numbers?” she explains. “The settlement went into lockdown early on, but still we found ways to survey those who are forcibly displaced in Cox’s Bazar settlement and get continuous feedback from the settlement managers. It was important to have translation in terms of simulation work between colleagues working in the settlement and the data scientists and developers.”

The team created questionnaires, cleared by the UNHCR Information Management Team, and provided them to people of concern so they could collect their own data and return it to UNHCR. This unique approach facilitated data collection despite COVID-19 restrictions in places UNHCR could not reach.

“Translator skills are so necessary because there’s the question of how you will interpret a figure,” Moreno Jimenez says, underscoring the need to “translate” data into contextual information that can be applied in real-world decisions and actions. “It depends on the operational context. To say 20% doesn’t provide much context, but if I tell you two out of 10 people are infected in a small population, then this is different. And that creates an image in your head of the number of people impacted.”

Ultimately, she says, success relied on bringing together an academic community, the other UN agencies working with the humanitarian community, and the settlement managers to provide feedback — and then of course to present the results.

“To merge the three communities made this an applied data science project rather than a theoretical data science project,” Moreno Jimenez says. “Data and epidemic modeling are tools that provide additional information to use in evidence-informed decision-making on humanitarian challenges. We presented the results so settlement managers could make decisions about issues such as keeping areas of the settlement open for freedom of movement — which is what UNHCR advocates — or about the importance of closing the schools, even if it has a negative impact on refugee education rates.”

Measuring the waves to make informed decisions

Because the schools did close, the decision whether or not to reopen them provides a clear demonstration of exactly how the modeling provided tangible data to settlement personnel. The data scientists looked at four major scenarios, one of which was what would happen if the Cox’s Bazar settlement reopened the schools. The team created a data model to predict the outcome if the teachers and students wore masks and there was increased ventilation, and another to examine the idea of reducing class sizes by having smaller groups of students attend every other day.

Especially because many of the schools are already open-air bamboo structures with plenty of natural ventilation, the models revealed that the settlement could largely reduce the potential harms of having bigger groups of children gather if they used a combination of mask-wearing, alternating class attendance, and ventilation. This gave settlement personnel data to consider when making decisions about schools, especially if the rate of infected people waxes and wanes, as it has been doing around the world.

Another key scenario the team modeled was designed to examine different health care delivery mechanisms. The settlement set up isolation centers relatively quickly for people who tested positive for COVID-19. But the settlement team was concerned about those centers being overrun, so the team created a model to look at the outcome if people with symptoms simply stayed home and quarantined, and only the most severe cases went to the isolation centers. This was a complex question, because there really is no good way to quarantine in the Cox’s Bazar settlement, with roughly seven people per shelter — and no separate bedrooms — and the only toilets being outside their individual shelters. But the modeling provided potential directions.

“The data suggested that there was limited epidemiological advantage to quarantining people in isolation centers rather than in their own shelters. This is because clinical data suggests that you’re more infectious before you become symptomatic, so if you’re only quarantining people after they show symptoms then they are highly likely to have already infected most of their shelter, given the living conditions in the settlement.” Aylett-Bullock explains. “The modeling allowed the personnel running the settlement to consider ways to help people stay in their shelters, such as delivering them food packages.”

The team has shared their data findings up through UNHCR and the UN, so they can have informed discussions about measures such as mask-wearing in other settlements. The information was also shared with other humanitarian operations and in the region of Bangladesh. So far, OCHA has already started using other types of models — such as the traditional SIR (infected, susceptible, recovered) models for infectious diseases — in South Sudan, Afghanistan, Iraq, Somalia, and beyond.

“Of course, it’s a challenge translating research into implementation, even within the settlement,” says Anjali Katta, a Scaling and Development Researcher with UNGP. “And you have to consider the different contexts, such as the open bamboo school structures in the Cox’s Bazar settlement compared to North American classrooms where the doors and windows are closed. But we collected varied lists of sources that frame and caveat our results.”

These models can be applied to other situations or used by other data scientists to create models for their own scenarios. They also helped UNHCR hold firm to their position that refugees should maintain their freedom of movement within any settlement as much as possible, while taking steps to mitigate the spread of disease.

The project has been so successful that WHO East Mediterranean Regional Office (EMRO) and UNHCR Somalia reached out to take a similar approach to the spread of COVID-19 in Somalia in two different scenarios: one modeling the spread of the disease with IDP populations and another model for understanding the spread in the host community. At the same time, UNGP has started modeling potential vaccination campaigns in the Cox’s Bazar settlement to understand the impact of the vaccine among its residents.

Clearly, the value of predictive and simulation modeling is becoming increasingly apparent across UNHCR and the humanitarian sector, rising up like a wave that’s too big to ignore. In particular, a collaboration like this one demonstrates how a combination of data and human intellect can lead to better decisions and policies for the good of people of concern everywhere.

Did you enjoy this blog post? Share it with your networks!

Share on facebook
Share on twitter
Share on linkedin
Share on email

News, thoughts and ideas about big data and AI, data privacy and ethics from across the Pulse Lab Network. Read more on the blog.

Pulse Lab New York

UN Global Pulse: a United Nations innovation network with a ripple effect

Senior Programme Manager Dr. Paula Hidalgo-Sanchis presents her chapter “UN Global Pulse: A UN Innovation Initiative with a Multiplier Effect” in the book “Data Science for Social Good. Philanthropy and Social Impact in a Complex World” and shows the ripple effect of UN Global Pulse’s innovation work.

Scroll to Top