Can social media analytics provide insights relevant for communicable diseases control in Indonesia?
CASE STUDY: Middle East Respiratory Syndrome (MERS)
What is MERS CoV?
Middle East respiratory syndrome (MERS) is a viral respiratory disease caused by a novel coronavirus (MERS-CoV) that was first identified in Saudi Arabia in 2012. The virus appears to be circulating widely throughout the Arabian Peninsula. All recent cases that have been reported outside the Middle East first developed infection in the Middle East; then the cases were exported outside the region.
The above information was taken from WHO’s Global Alert and Response FAQ page – please visit the page directly for more details. The latest information on MERS CoV cases can be found in WHO Disease Outbreak News.
In response to the MERS outbreak, one of WHO’s recommendations is that countries enhance their surveillance for severe acute respiratory infections. So we asked ourselves – can we capture information about potential cases of MERS CoV in Indonesia using social media?
Indonesians have a bit of a reputation for tweeting, so we turned to Twitter as our social media source. To extract relevant tweets we used big data analytics software called Crimson Hexagon and a ‘taxonomy’ – a set of selection criteria (see below). We wanted to detect any cases of people returning from the Middle East with respiratory symptoms – a lot of Indonesians visit the Middle East for pilgrimage (Hajj/Umrah) before returning home (pulang).
We also used an additional taxonomy to exclude non-Indonesian language tweets (e.g. Malay).
When we applied these taxonomies to all publicly available tweets - the ‘Twitterverse’, 6492 tweets fitted our selection criteria. We quickly trained the computer to classify tweets as relevant or irrelevant, after which we did both automated and manual analyses on the 4936 tweets that the algorithm classified as relevant:
We found that although the majority of tweets we picked up were from news sources, we also detected tweets from individuals – for example “recently my friend was asking me about MERS symptoms. He just got back from Umrah and is having respiratory symptoms similar to pneumonia”.
We were able to get real-time snapshots of trending topics using big data analytic methods (headline photo). Manual analysis helped us at look the events that were being discussed in more detail. The graph below shows how people who had returned from the Middle East with respiratory symptoms featured on social media between the 5th and 13th of May (shown for Java – the most populated island in Indonesia).
It is important to note that although these ‘suspected’ cases and deaths were linked with MERS on social media, these events are not confirmed and could be wholly inaccurate. To the best of our knowledge (on May 20th 2014), no MERS CoV cases have been confirmed in Indonesia. Please go to official sources of information like WHO’s Disease Outbreak News for information about MERS CoV.
What were the strengths and limitations of our method?
We were able to find information about MERS CoV and using big data methods, we were able to find it quickly.
To mobilize quickly:
- Our taxonomy was developed rapidly - it is likely that it could be improved upon with refinement
- We only trained our classification algorithm with a small subset of data - it is likely that we missed some tweets through misclassification
- We could only download content of 1000 tweets/day (software limitation). As ‘relevant’ tweets exceeded 1000 on 7th May (1471), we were only able to manually check 2/3 of tweets on that day
So, is this type of information useful for MERS CoV surveillance in Indonesia?
As yet we don’t know and we are leaving that decision to disease surveillance experts. For this type of data to be useful, it needs to add value to the existing data. It is likely that the types of questions they are thinking about are: Does it provide additional information? Is it relevant? Is it any more timely? Is there capacity in place to process and respond to the additional information? Would this help disease surveillance teams process information or would it put an additional burden on them?
A question for you: if you publicly tweeted about your disease symptoms and (with your best interests in mind) you were contacted directly by a health authority about it – how would you feel?