Using Big Data – classification and assessment
- Big data sources where the primary usage is for geolocation, such as Facebook tags, Twitter, Flickr, Foursquare and mobile data GPS.
- Big data sources where the location component is not part of its primary usage, or ‘data exhaust’: digital transactions such as financial services (including purchases, money transfers, savings and loan repayments), communications services (such as anonymized records of mobile phone usage patterns) or information services (such as anonymized records of search queries, for example: ‘moving to Australia’, ‘how to get a work permit in Germany’).
- Lack of ground truth data: As introduced above, as reliable traditional data on international migration flows are scarce, it very difficult to test the applicability of a new data sources as proxy
- Lack of big data sharing/access: Private sector companies which hold a great deal of this data are not incentivised and/or don’t have the mechanisms in place to make it available for analysis through open data protocols, “data philanthropy” partnerships, or other means
- Big data privacy issues: These data, if not anonymized and aggregated properly, carry the risk of re-identification and use of personal information.
- Selection bias: Most big data sources are limited to online data users, generally biased towards younger, wealthier and more urban citizens
- Complexity of big data modelling: By its very nature -newer, larger, more complex datasets require more advanced analytical techniques
Evidence of big data for international migration
Headline image: Migration flows to Australia, People Movin project