Sifting Through, and Making Sense of, Big Social Data

5 min read

Global Pulse is very pleased to officially announce a data philanthropy partnership with DataSift – a social data platform that enables aggregation and filtering of social media posts to extract insights.

DataSift is partnering with us to provide access to its platform to enable ongoing R&D to learn how social data can provide insights on humanitarian and global development issues. For example, data analysts like myself, and our data scientists working in innovation labs in New York, Jakarta and Kampala are using the DataSift platform to find data from millions of public social media posts related to issues such as disease, unemployment and food security. This will help us accelerate the volume of successful methodologies and use cases, and unearth best practice for using social data analysis for global development or humanitarian applications.

Today I wanted to share what we’ve learned so far about sifting through and filtering millions of posts to identify trends in attitudes or behavior.

Why Social Media?

The techniques of mining social media were first developed in academia; specifically amongst artificial intelligence researchers in computer science departments. The methods were then adopted and deployed within the private sector, where conducting market research is a long-standing good practice, and social media became a new opportunity to understand the motivations and attitudes of customers to feed into advertising.

But understanding people’s attitudes, and benchmarking levels of chatter to monitor changes, is also very useful for understanding the impact of social protection policies or whether advocacy campaigns are making a noticeable impact.

The practice of social media mining is still expanding – with new methodologies, ethical standards, technology tools and platforms being developed all the time. For example, sentiment analysis is not yet a robust technique; due to the subtleties of languages, machines still have a hard time detecting humor, sarcasm and irony. It might be another couple of years until gauging sentiment using technology offers reliable indicators, but in the meantime, sentiment analysis tools can still be of interest to practitioners with deep subject knowledge who can contextualise the trends they see, alongside other sources of information.

Creating a Social Media Taxonomy about Post-2015 Global Development Priorities

What is unique and valuable about using big, social data in the fields of global development or public policy is the ability to tune into the changing volumes of certain topics, as expressed in everyday language. This is different than simple social media analytics of counting hashtags or mentions. This type of analysis requires filtering through social media content for targeted groups of keywords – otherwise known as a taxonomy.

Building a good taxonomy requires a collaborative and iterative process. It requires both an understanding of the topic under consideration so that one could identify the key words that people might use when discussing the topic (for example: education conversations might also include the words tuition), and not only language skills, but also contextual knowledge to understand short-hand and local slang.

For example, we created an extensive taxonomy of keywords for a current project with the UN Millennium Campaign called “The Post-2015 Global Conversation.” The project filters all publicly posted Tweets for approximately 25,000 keywords and combinations of keywords related to 16 global development topics in French, English, Spanish and Portuguese. It yields about 10 million relevant new tweets each month.

We are using this taxonomy as a complement to the UN’s MY World Survey as a way to help the Millennium Campaign hear from everyday people about their concerns and priorities, even if they are not involved in the official Post-2015 political process. Since the 16 categories are the same as the ones in the UN’s MY World Survey, our task was to develop a list of words, which people might commonly use when talking about those topics:

  • A good education
  • Access to clean water and sanitation
  • Action taken on climate change
  • Affordable and nutritious food
  • An honest and responsive government
  • Better healthcare
  • Better job opportunities
  • Better transport and roads
  • Equality between men and women
  • Freedom from discrimination
  • Phone and internet access
  • Political freedoms
  • Protecting forests rivers and oceans
  • Protection against crime and violence
  • Reliable energy at home
  • Support for people who can’t work

DataSift’s platform makes it possible to create complex queries to find trends in very specific areas. In this instance, the creation of the 16 topical taxonomies was created purely by hand, with our domain experts coming up with lists of relevant words. Then we also added to the filter common misspellings, accents, different genders and tenses etc. Lastly, we tested the resulting keyword sets for signal and tried to improve accuracy by using logical operators. In the end, we ended up with a search query script spanning 1786 lines.

As the focus was on everyday language, we do not just look for phrases like “gender equality” or “women’s empowerment”, but also for words or phrases like “glass ceiling”, “women don’t get promoted”, or any combination of the words “girls”, “access”, and “school”. The result is staggering.

As social media is used more and more, we get more and more content from a larger and larger part of the world. Below is a graph that shows which of the 16 topics mentioned above is talked about the most on a global scale. On the right-hand side, you can see in which months we found the largest number of relevant tweets. Graph: Bipartite graph created using RAW 

Many widely-used social media monitoring tools query for one or several hashtags, but by taking the approach described above we can gain more sophisticated insight into a given policy area (be that sanitation, food security, gender issues, public health or many different subjects).

We’re keen to hear from others experimenting in this area, so do leave a comment and let us know if you are using social media mining in the fields of global development or public policy.

Did you enjoy this blog post? Share it with your networks!

News, thoughts and ideas about big data and AI, data privacy and ethics from across the Pulse Lab Network. Read more on the blog.

Pulse Lab Kampala

Dialogue is the Key- Shaping AI for Africa

At UNGP, we believe in dialogue. This is why we participated in the Conference on the State of Artificial Intelligence in Africa (COSAA) held in March 2023. The conference was

Scroll to Top