Telecom Italia's "Big Data Challenge" was an online call for developers, researchers and designers from all over the world to come up with new big data services and applications. Telecom Italia made a dataset of its own mobile phone data (millions of anonymized and geo-referenced records of calls from Milan and Trento, covering the period of November to December 2013) available for the challenge, in addition to offering other unique data sets provided by partners. In April 2014, winning teams were selected for each of the three tracks of the Big Data Challenge - Data Analytics, Data Visualization and App Development.
At Global Pulse, we believe that big data challenges are a great way to use the wisdom of the crowd to innovate and showcase the potential of big data for public good. We spoke to Telecom Italia's Fabrizio Antonelli and Bruno Lepri about the process of running a successful big data challenge and the company's motivations for doing so.
Motivation and Goals
Q: What was the motivation of Telecom Italia’s Big Data Challenge? Was there a “spark” that launched it?
A: Telecommunications operators hold a vast amount of data, thanks to their large customer base and the integration of their technologies in many market segments. Nevertheless, most of carriers have not figured out how to exploit this invaluable asset as it is removed from their traditional core business. As highlighted by Mayer-Schönberger and Cukier, authors of the best-seller book “Big Data: A Revolution That Transforms How we Work, Live, and Think”, “Some carriers wrongly see only a cell tower, while it's a data gathering platform.”
Furthermore, we noticed a gap between those holding the big data (usually companies, which don’t know how to exploit it best, and those holding the solutions and the algorithms (usually researchers in academia, who often do not have access to the sets of data needed to validate).
With the Telecom Italia Big Data Challenge
, we tried to fill this gap by moving away from the traditional conservative approach of the operators and instead freeing lots of heterogeneous data to let others explore their own innovative ideas on data usage.
As we did not put any constraints on the Intellectual Property Rights (IPRs) of the developed solutions we gave participants the opportunity to try out their ideas and to exploit and further investigate them beyond the challenge. On our side, we received over 100 big data applications that may become part of Telecom Italia market offer and we connected with an expert network of 1100+ professionals all over the world.
The spark to launch the challenge came from SKIL Lab, Telecom Italia’s innovation lab on big data technologies, after participating in a datathon for social good organized by Telefonica in London, together with FBK and MIT Media Lab researchers.
Q: Was there an overarching goal or theme to the Challenge? Were there target audiences identified?
A: Unlike other similar initiatives, we decided to keep the challenge goal-free and to broaden as much as possible the variety of data provided. This let the participants be as free as possible to explore and submit their own ideas. Thanks to that broadness, we received projects ranging from prediction of energy consumption to the geographic clustering of happiness and identification of pollution factors .
The challenge was open to the public and organized in tracks: application development, analytics and data visualization, targeting developers, data scientists and designers respectively. Interestingly, we received applications both from academics groups (in particular, machine learning, complex systems and network science groups) and from companies (e.g. insurance companies, big data start-ups, etc.).
"Interestingly, we received applications both from academics groups, in particular, machine learning, complex systems and network science groups; and from companies including insurance companies, big data start-ups, etc.."
About the data
Q: What was the process of making the Telecom Italia data sets accessible for the challenge? How did you deal with aggregating the data, and protecting personally identifiable information?
A: We worked under the assumption that data can be extremely useful even when aggregated and anonymized, as they can be an important source for analyzing trends and phenomena at a regional level. The SKIL Lab team worked in synergy with the internal Privacy and Security Compliance departments to ensure that data formats were compliant with national regulations on personal data management. Hence, we aggregated data in geo-time bins that made de-anonymization impossible (i.e. no reference to individual customers was included in the data).
Since we dealt with several data types (mobile data, energy data, GPS, etc.) we worked to find the correct (compliant) data format for each of them, keeping a common geographic reference called "city pixel". The "city pixel" is a smaller square unit to aggregate data. This ensured protection of privacy and made the different data sets easy to cross-analyze and to correlate; this was very much appreciated by the participants.
Every participant had to sign a “terms of participation” but no personal data was provided to them.
Q: How did you decide on the four partner companies for the challenge (Citynews - local news platform; Cobra - auto, insurance, telematics; SET Distribuzione - energy distribution; Meteotrentino - weather)? Did those partner companies come forward to share data for the challenge, or did you approach them?
A: One of the successful elements of the challenge was to broaden the variety of data sets provided, as much as possible. To reach this goal we involved partners that already have business with Telecom Italia and brought them onboard.
About the Challenge
Q: How many and what types of submissions did you get? Were there any that surprised or inspired you?
A: The Telecom Italia Big Data Challenge gathered 100+ submissions from 1100+ participants, most of them on the data analytics track. And surprising submissions? Yes, we have been impressed by a work from a team coming from the ISI foundation that investigated the distribution and the common patterns of people in the city using the entropy in the data, studying the heterogeneity of their mobile phone activity. We believe that their approach can be easily adopted and is very promising for future analyses.
Another interesting project was the one submitted by a team coming from Universitat Rovira i Virgili (Tarragona, Spain) and University of Birmingham (UK). The team proposed a new routing system for urban mobility
leveraging the multi-layered characteristics of the datasets made available.
Q: What feedback have you received since the Challenge? Have other companies expressed interest in sharing their data for similar Challenges? What was the biggest thing you learned?
A: The feedback on the challenge has been very positive. From the perspective of the data providers, that are not necessarily data companies, I think they were pleasantly surprised by how effective and interesting the analyses could be of their data. At the very beginning, we had to convince companies to participate, as some of them were really cautious about opening up their data and skeptical about the kind of results that could come out of the contest. We invited many representatives of financial, insurance, transportation companies at the award event (the Big Data Jam) and many of them, who were not data providers in this edition, expressed interest in being part of next year's edition and opening up their own data.
We gave participants more than two months to develop their ideas. Before that we worked hard on data preparation so that the quality was such that the data was immediately usable by the participants. Despite this, a number of participants asked for more time to work on their ideas, so this should be taken into account for future challenge timescales. Currently, Telecom Italia and its partners are in touch with some of the participants to explore bringing to the market some of the apps which were submitted.
At the end of the day, the lesson is that everybody is talking about big data but there is still a long road to exploiting its full potential. The potential of data-driven approaches still has to be fully demonstrated, and we need more success stories, not only in technology, in order to be convincing at all levels (including market level, political and regulatory level, etc.). There is a growing community that can help large organizations to reach their big data goals. Such initiatives are extremely important in filling existing gaps.
At the end of the day, the lesson is that everybody is talking about big data but there is still a long road to exploiting its full potential. The potential of data-driven approaches still has to be fully demonstrated, and we need more success stories, not only in technology, in order to be convincing at all levels (including market level, political and regulatory level, etc.).
Q: What do you think is the biggest motivation for sharing private sector data for open innovation efforts? Do you see mobile providers moving toward greater “private sector open data” or “data philanthropy”?
A: We wouldn’t say that the conditions for private sector data sharing have matured yet, as the data still represents one of the most valuable assets that operators hold and they don’t want to miss out on the big data market potential. However, there are encouraging signs towards a more open approach to data sharing, especially to deal with societal challenges such as urban planning, emergencies detection, identification of growth indices, and so forth.
Q: Are there plans for future Big Data Challenges? What do you think is the biggest value of such Challenges that involve sharing access to private sector datasets that could address societal challenges?
A: The European EIT ICT Labs network
, that FBK and Telecom Italia are part of, found the challenge really interesting and we are now in discussion with other partners with a view to having a similar initiative at a European level (i.e. with multiple European cities involved) next year. We believe such initiatives are a driver to help the organizations and the regulators to shape the big data landscape. Our challenges help people to understand the value and the potential of the data to impact on our lives, when they are used in a fair, harmless and privacy preserving way.
These initiatives help create the data ecosystem made up of large organizations, talented scientists and SMEs, by increasing the awareness on benefits of data sharing and the synergies among them.
"These initiatives help create the data ecosystem made up of large organizations, talented scientists and SMEs, by increasing the awareness on benefits of data sharing and the synergies among them."
Fabrizio Antonelli earned his master’s in Computer Science at the University of Torino after a scholarship at the Arizona State University. His career began at Tilab - Torino, the research center of Telecom Italia. He currently leads the Semantics and Knowledge Innovation Lab (SKIL) of Telecom Italia, located in Trento at the EIT ICT Labs Co-located center, whose goal is to investigate and design novel data-driven service. Main projects include heterogeneous Big Data correlation pattern extraction (mostly from telecommunication data) and human dynamics analysis from personal data. The latter includes the development of technologies for personal data preservation, cited by the World Economic Forum as reference case in 2013 (see Mobile Territorial Lab) . In 2014 the SKIL lab organized the Telecom Italia Big Data Challenge, attended by 1100 professionals from all over the world. Fabrizio has recently been awarded among the 10 Italian Young Innovators Under 35 by the MIT Technology Review program.
Bruno Lepri leads the Mobile and Social Computing Lab (MobS Lab) at Bruno Kessler Foundation (Trento, Italy). In 2014, he co-organized the Telecom Italia Big Data Challenge Bruno is also research affiliate at the MIT Media Lab working with the Human Dynamics group and he is currently responsible for research at the Mobile Territorial Lab, a living lab launched in November 2012 by Telecom Italia, FBK, MIT Media Lab, and Telefonica involving more than 100 families in Trento. Since 2014, Bruno has been a senior research affiliate of Data-Pop, a global alliance on Big Data and development created by the Harvard Humanitarian Initiative (HHI), the MIT Media Lab, and the Overseas Development Institute (ODI).. In 2010 he won a Marie Curie Co-fund post-doc fellow and has held post-doc positions at FBK and at MIT Media Lab. He holds a Ph.D. in Computer Science from the University of Trento. Bruno is currently working on using mobile phone data, social media, and open data to deal with several societal challenges such as crime levels, energy consumption and spreading of epidemics.
Image: map by Richard Sachetto via Flickr