Information on the spatial distribution of populations and their international movement is an integral aspect of global development.
Migrants contribute to development in their communities through new norms and ideas they bring and send to their home, and in their economic activity. One of the most tangible outcomes of migration are remittances: by working abroad and sending money back to their families at home, migrant workers play an important role in enhancing development and poverty reduction in countries of origin and contribute towards prosperity in countries of destination.
From informing the planning of transportation systems and access to basic social services to driving economic, cultural and environmental changes, getting access to reliable and timely international migration flows is critical for policy formulation, planning and implementation, and for monitoring and evaluation.
However, while many countries have made great progress in obtaining accurate migration figures through censuses, demographic and thematic surveys and administrative registers, there are significant shortcomings of current international migration statistics. There is an information gap regarding the demographics of those migrating, where migrants come from and where they go to, and when they move.
Traditional migration data are often outdated, mostly consolidated on a yearly basis only. Definitions of migrants and data collection processes are inconsistent between countries. Socio-demographic components, especially on gender, are rarely tracked. In some cases migration data are simply nonexistent.
In order to palliate to this lack of information, a new field of research has emerged: alternative data sources or ‘big data’ have been explored as proxies for international migration flows. While this field is still burgeoning, the evidence is promising. This blogpost aims to provide a snapshot of the state of the art research evidence of the use of big data to understand international migration.
Using Big Data – classification and assessment
We can categorize relevant big data sources for international migration into two types:
- Big data sources where the primary usage is for geolocation, such as Facebook tags, Twitter, Flickr, Foursquare and mobile data GPS.
- Big data sources where the location component is not part of its primary usage, or ‘data exhaust’: digital transactions such as financial services (including purchases, money transfers, savings and loan repayments), communications services (such as anonymized records of mobile phone usage patterns) or information services (such as anonymized records of search queries, for example: ‘moving to Australia’, ‘how to get a work permit in Germany’).
They are also common challenges in using big data in this field:
- Lack of ground truth data: As introduced above, as reliable traditional data on international migration flows are scarce, it very difficult to test the applicability of a new data sources as proxy
- Lack of big data sharing/access: Private sector companies which hold a great deal of this data are not incentivised and/or don’t have the mechanisms in place to make it available for analysis through open data protocols, “data philanthropy” partnerships, or other means
- Big data privacy issues: These data, if not anonymized and aggregated properly, carry the risk of re-identification and use of personal information.
- Selection bias: Most big data sources are limited to online data users, generally biased towards younger, wealthier and more urban citizens
- Complexity of big data modelling: By its very nature -newer, larger, more complex datasets require more advanced analytical techniques
Evidence of big data for international migration
Recent research projects have highlighted the potential of using big data to understand international migration phenomena:
With the proliferation of social media apps on mobiles with check-in function, there is a growing quantity of new geolocation data. Geolocated Tweets have been used to infer international migration trends and the relationship relationship between internal and international migrations (Zagheni et al, 2014). While the trends could not be calibrated with official migration data, results showed that Twitter data could predict turning points in migration trends, among others.
The Facebook Data Science team (2013) studied aggregated, anonymized data on all Facebook users who list both their hometown and their current city on their Facebook profile and looked at coordinated migration [a flow of population from city A (hometown) to another city B (current city) is considered a coordinated migration if, among the cities which people from hometown A currently live, city B is the city with the largest number of individuals with current city B, and hometown A]. The study highlighted how Facebook data can be used to study of human mobility, in particular with the possibility of mapping internal and international migrations alongside each other.
Coordinated migration Worldwide, The Facebook Data Science team (2013)
Urbanization growth between 2000 and 2012 for the top coordinated migration destinations,The Facebook Data Science team (2013)
Turning to geolocation, it turns out that where people log in to use their email may also be an interesting proxy for migration. In one study, the IP addresses from Yahoo! email accounts of an initial sample of 100 million users were used to predict and analyze flows of migrants and tourists and patterns of back and forth between country of origin and residence (State et al, 2013)
This data set was also used to estimate the age and gender distribution of migration flows (Zagheni and Weber, 2012).
As introduced above, one caveat regarding big data sources is the question of selection bias introduced from the user base. Search engines are less prone to selection bias as they are much more widely used than social media, and represent the gateway for virtually all internet users to access new information. Another interesting aspect is that search terms are able to capture things people are interested in doing or are planning to do, rather than what they have actually done. Sometimes this bias is problematic, but in some ways it can be illuminating as a new and different measure of people’s behaviour. For example, official unemployment statistics may decrease as people find work. However online activity of underemployed people searching for new opportunities may give a more meaningful measure of the health of the labour market.
The exploration of web searches for approximating international migration has been so far limited. Researchers from the Office of National Statistics from the UK (Williams & Ralphs, 2013) looked at national and sub-regional patterns of in-migration from EU8 countries to UK, and the language of their search.
Williams and Ralph found that Google searches for ‘Polski’ were closely related to statistics on Polish nationals in the UK. The results were also promising for migration from Romania and Lithuania, less for the other countries in scope.
Another preliminary study investigated the viability of using Google searches to approximate ground truth data on changes in the nationalities of residents in Spanish cities. Some correlation was found between the volume of search terms and some South American countries. However, a shortcoming of the analysis was that each relevant search term decreased over time due to the growth of the use in other search terms.
Both studies were very preliminary, further work needs to be done to explore the potential of using web searches for international migration
Finally, big data has proven to be promising for looking specifically at international travel flows. For example, geotagged photos from Flickr represent an exciting new data source for predicting travel behaviour (Clements et al., 2010).
While mobile phone data is most traditionally to study local movements for various development applications, it is also used to inform the tourism industry. In a recent Spanish study (Telefónica and RocaSalvatella, 2014), data about the activities of foreign mobile handsets that used a Spanish operator in Madrid and Barcelona was combined with data on electronic payments by foreign banking cards. The innovative combination of these two data sets allowed to make cross references and were and successfully used to extract various key indicators of the tourism industry such as length of stay by country of origin and average daily spending and cumulative spending throughout the entire stay.
In parallel, Ahas et al, 2014 shared their experience encountered while trying to overcome privacy and regulation related, financial and business related and technological barriers when trying to use mobile phone data for tourism statistics in Europe (Eurostat, 2012).
What’s next?
The need for reliable international migration data will only increase with changing demographics, globalization and climate change increasing migration pressures across borders. Previous research highlights the potential for using big data as proxy for international migration, but there is still a lot of work to be done, validating methodologies and embedding them into the process of building new indicators and using them to inform policies and programmes.
At Global Pulse we are encouraged by this promising evidence and we welcome data scientists and practitioners willing to collaborate on research to make progress in this field. Above all, we will continue to work to ensure that privacy principles are respected and ethical considerations are made to ensure big data does not adversely affect the human rights of migrants in accordance with international law.
Headline image: Migration flows to Australia, People Movin project