This article is part of a special Global Pulse Guest Blogger Series: “Data Mining for Development: Methodological Innovations & Challenges.”
Vanessa Frias-Martinez, PhD is a scientific researcher in the Data Mining and User Modeling Group at Telefonica Research in Madrid, Spain. Her main interest is in technologies for emerging markets and sustainable development. Her research combines data mining and machine learning techniques to analyze digital footprints of cell phone users in emerging economies and has created. Enrique Frias-Martinez, PhD leads the Smart Cities and Mobility Applications Initiative at Telefonica Research. His current research interests include urban and pervasive computing, technologies for developing countries, data mining and machine learning
Cell phones have become one of the most pervasive technologies reaching urban and rural populations across all socio-economic spectrums, and as such can be considered as one the main sensors for human behaviour. For example, in Latin America, over 170 million customers use Telefonica’s* cellular networks every day, generating vast amounts of behavioral data that can provide individual and collective fingerprints about mobility patterns and communication schemes.
Traditionally, telecommunication companies have used such information for business intelligence applications, i.e. by getting to know a person better, the operator could provide better services. Nevertheless the implications of such information go beyond business intelligence and can have a deep impact in improving living conditions by providing tools that help to monitor social problems in a cost-effective way.
With that idea in mind, in Telefonica Research we focus on analyzing ways in which cell phone data can be used for social good. We believe that the analysis of calling behavioral patterns can give an understanding of how citizens interact with their environments providing critical information relevant to areas like urban planning, crisis management or global health. For example, cell phone data can help policy makers explore the mobility patterns of urban populations to propose new public transportation routes that will assist citizens with their daily commutes. Similarly, cell phone data can allow institutions to map and understand in near real-time the migrations provoked by natural disasters. By doing so, citizens might receive aid and regain access to social services quicker.
We believe that the analysis of calling behavioral patterns can give an understanding of how citizens interact with their environments providing critical information relevant to areas like urban planning, crisis management or global health.
Inspired by these ideas, I'd like to describe two tools (CenCell and AlertImpact) that we have developed to aid decision makers in public and private institutions better assess their policy decisions in order to improve the quality of live in communities across Latin America. CenCell has been designed to compute affordable census maps and AlertImpact to evaluate the impact of public health alerts on epidemic spreading. Although we put a special emphasis on enhancing policy decisions across the Latin American continent where Telefonica is largely present, the techniques behind our tools are applicable to cell phone data from any telecommunications company worldwide.
The base of our tools is the information collected by cell phone networks. Cell phone networks are built using a set of cell towers, also called Base Transceiver Stations (BTS) that are in charge of communicating cell phones with the network. Each BTS has a latitude and a longitude, indicating its geolocation, and gives cellular coverage to an area called a cell. Cell phone data, also known as Call Detail Records (CDRs), are generated when a mobile phone connected to the network uses a service i.e., voice, data, SMS or MMS. CDRs are saved by telecommunication companies for billing purposes and include the encrypted caller and callee cell phone numbers, time and date of the call, its duration, as well as the geolocations of the cellular towers (BTSs) where the interaction took place.
CDR can be used to model a variety of behaviour variables, which can be clustered in three group variables (see Figure 1): (1) consumption, (2) social and (3) mobility variables.
Consumption variables characterize general usage characteristics such as average number of calls or expenses. Social variables compute factors relevant to the social networks that individuals build with their cell phones including the number of contacts or the strength or their communication ties. Finally, mobility variables characterize spatio-temporal mobility patterns such as geographical areas where individuals spend most of their time or the areas where a person and her social contacts typically tend to move around. Our tools are based on using data mining and machine learning techniques based on the previous variables.
Figure 1 (Above): Build tools to provide behavioral information based on cell phone data to public and private institutions working for social good.
CenCell is a tool that allows to compute affordable census maps using anonymized call detail records [3, 4]. Census maps gather large amounts of information regarding the socio-economic status of households at a national scale. These maps contain information that characterizes various social and economic aspects like the educational level of the citizens or the access to electricity. The accuracy of these maps is critical given that many policy decisions made by governments and international organizations are based upon variables measured through census maps. National Statistical Institutes (NSIs) compute such maps every five to ten years, and typically require a large number of enumerators that carry out interviews gathering information pertaining the main socio-economic characteristics of each household. All these prerequisites make the computation of census maps highly expensive, especially for budget-constraint emerging economies. To reduce costs, some countries have made cuts both in the number of interview questions and in the number of citizens interviewed, which unfortunately impacts the quality of the final census information.
Figure 2: CenCell approximates the socio-economic level of regions using anonymized Call Detail Records.
To overcome these limitations, CenCell determines the socio-economic level of a region based on the aggregated cell phone behavioral patterns of its citizens (see Figure 2). At its core, CenCell consists of a classification algorithm that requires both cell phone data and a certain amount of census information to bootstrap the tool. CenCell’s main contribution is that it significantly decreases the workload of the enumerators by reducing the number of geographical areas that need to be covered through household interviews. Once the tool is bootstrapped, National Statistical Institutes can use CenCell to approximate the socio-economic level of regions not covered by the enumerators, thus saving on the budget allocated for the computation of census maps. We have tested CenCell’s algorithms using cell phone data and socio economic information from a city in an emerging economy in Latin America. Our results show that the socio-economic level assigned by CelCell using cell phone traces are very good approximations when compared to the original values captured by the local NSI. This allows to compute socio-economic levels at a fraction of the original cost and with a higher frequency in order to evaluate more frequently the impact of the decisions made for each specific geographical area.
We have tested CenCell’s algorithms using cell phone data and socio economic information from a city in an emerging economy in Latin America. Our results show that the socio-economic level assigned by CenCell using cell phone traces are very good approximations when compared to the original values captured by the local National Statistical Institute. This allows to compute socio-economic levels at a fraction of the original cost and with a higher frequency in order to evaluate more frequently the impact of the decisions made for each specific geographical area.
In the area of global health, we have designed AlertImpact a tool that allows to evaluate the impact of public health alerts on epidemic spreading [2, 1]. In case of a pandemic, the World Health Organization (WHO) recommends the assessment of activity suspension in educational, government and business units as a plausible measure to reduce the transmission of a disease. Following these recommendations, governments usually institute policies that aim to reduce individual mobility in order to control an epidemic. However, countries might suffer if these actions are continued over time, especially in emerging regions with weaker and more informal economies.
AlertImpact allows to understand the impact of health alerts on the spreading of an epidemic. As a result, health authorities can calibrate their future action plans. Specifically, AlertImpact uses cell phone records to build an agent-based framework that models the social and mobility patterns of a population under different policy actions.
Figure 3: AlertImpact allows to explore and evaluate different policy measures taken during an epidemic spreading.
We used AlertImpact to understand the impact that the preventive actions taken by the Mexican government to control the H1N1 flu outbreak in 2009 had on the spreading of the epidemic. On April 16th the authorities raised a medical alert, followed by the closing of schools and universities on April 27th and the final shutdown of all basic activities on May 1st. AlertImpact used cell phone data from Mexican citizens to model their social and mobility patterns during the preventive actions and compared them against a baseline representing the normal behavior of the population under no restraining policies.
Figure 3 shows how the mobility of the population was reduced by a maximum 30% thanks to the preventive actions raised by the government. Our results also determined that these reduction in mobility resulted in a decrease in the number of infected cases by a 10% and the peak of the epidemic was postponed by 40 hours allowing authorities to react faster to control the epidemic.
CenCell and AlertImpact are just two examples of the type of analysis that can be done using cell phone data and its impact in important policy areas like urban planning, global health or crisis management. In the future, we expect to extend our research by engaging with public and private institutions that will help us enrich our analyses with additional features catering to specific analytical or evaluation needs. Ultimately, the aim is to enhance policy decision making in low-income communities across Latin America.
 E. Frias-Martinez, G. Williamson, and V. Frias-Martinez. An agent-based model of epidemic spread using human mobility and social network information. In The 3rd IEEE Int. Conf. on Social Computing (SocialCom 2011), Boston, MA, USA
 V. Frias-Martinez, A. Rubio, and E. Frias-Martinez. Measuring the impact of epidemic alerts on human mobility using cell-phone network data. In Second Workshop on Pervasive and Urban Applications, PURBA, at the Tenth International Conference on Pervasive Computing.
 V. Frias-Martinez, V. Soto, J. Virseda, and E. Frias-Martinez. Computing cost-effective census maps from cell phone traces. In Second Workshop on Pervasive Urban Applications, PURBA, at the Tenth International Conference on Pervasive Computing.
 V. Soto, V.Frias-Martinez, J. Virseda, and E. Frias-Martinez. Prediction of socioeconomic levels using cell phone records. In User Modelling, Adaptation and Personalization, UMAP, 2011.
* Telefonica is a world leader in the telecommunication sector, with presence in over 23 countries and with more than 218 million customer accesses. Services offered by the Telefonica group include mobile and fixed line phone, ISP, IPTV, web portals, and others. Telefonica Research is the innovation company of the Telefonica Group. Owned 100% by Telefonica, it was formed it 1988, with the aim of strengthening the Group’s competitiveness through technological innovation. It is the most important private research company in Spain, in terms size, activities, resources, and participation in European Research projects.