In August 2014, the National Information Society Agency of Korea launched a big data challenge with two different levels aimed at undergraduate and graduate-school level students under the name the ‘Big Contest’. At Global Pulse, we believe that big data challenges are a great way to use the wisdom of the crowd to innovate for public good. Pulse Lab Jakarta staff interviewed Shinae Shin (Director, Big Data Strategy Center), one of the main drivers behind the project, and asked her to share some lessons from this experience.
Motivation and Goals
Q: What is the idea behind Big Contest 2014, and how important is big data for South Korea?
A: Recently, there has been growing interest in big data in South Korea. We think big data challenges are useful in raising awareness of this area, in developing big data analytical skills in Korea and in fostering talent in this field. In order to promote use of big data, last year the National Information Society Agency (NIA) organized a Big Data competition – or “Big Contest 2013” – in collaboration with both the public and private sectors and with the support of the Ministry of Science, ICT, and Future Planning.
Big Contest 2014 is the second event, securing more big data and providing more interesting challenges, providing good opportunities to both the participants and private sector companies involved. We hope that the Big Contest will be able to foster big data-driven decision-making in the public and private sectors.
In terms of the choice of theme, the Korean President often emphasizes the importance of Korea’s creative economy. We think that harnessing big data is a means to boost the creative economy, with opportunities and potential across both public and private sectors. South Korea is also one of the global leaders in IT technology and we hope to become a key player in utilizing big data as well, drawing upon on our world-level network and IT infrastructures.
Q: What was the profile of participants who entered the Big Contest 2014?
A: We created two tracks or leagues, we call them the (a) Futures League and (b) Challenge League and they are based on the complexity of the challenges. The Futures League is open to high school and university students, and the Challenge League targets students from undergraduate and graduate schools as well as adults who are not already working in big data analytics. Individuals could participate but a team of up to five people can also enter. Entrants spent about two and half months between August 1st, 2014 and Oct 16th, 2014 analysing the data before submitting their results.
Q: What are the challenges you posed to Big Contest 2014 participants?
A: Each league tackled one question. The Futures League worked on the question of predicting the popularity of various films. The Challenge League had a question related to sales records which they forecast using supermarket sales data in specific areas.
Q: How do you decide the winners?
A: Throughout two rounds of evaluations, the winners were chosen. The jury members, consisting of five representatives from academia and 12 from the private sector, submitted results (such as prediction models, scenarios, and outcome) and they selected a shortlist of 17 teams. Then based on their presentations, the final winners were chosen. The presentations are open to the public, and we also wanted to provide the private sector with opportunities to find talented people whom they might want to hire.
About the Data
Q: What type of data did you use and how did you prepare the datasets to be used in the contest?
A: The datasets came from both public and private sectors. The data we used for the Futures League includes (a) a set of statistics and details of films shared by the Korea Film Council (KOFIC) and (b) 18 months’ worth of social data derived from thousands of online comments about movies from a Korean movie portal company
The data for the Challenge League consisted of different data sets, such as:
(a) transportation usage from buses and the subway system, including aggregated boarding/disembarkation/transfer records, time range of journeys and coordinates of transport hubs (b) information on surrounding facilities, and (c) aggregated credit card consumption patterns.
All sensitive information was removed and privacy properly protected using anonymization and aggregation methods.
Q: How did you select contest partners? Did the partners take the initiative of sharing data, or did you directly request the data?
A: The National Information Society Agency (NIA) and the Federation of Big Data Associations in Korea, with experts from different sectors, such as private sector, academia, and governments, identified potential partners. The data was provided by five organizations that are broadly supportive of our aims.
About the Challenge
Q: What are the expected benefits from this contest?
A: This contest aims to foster a better big data ecosystem by allowing more people to have real experience with big data, (rather than small or synthesized data) and to practice their skills. One of the benefits of such a challenge is that it provides people with a chance to analyze big data to solve real problems on the ground i.e. predicting the popularity of films and forecasting sales records. By using data analysis to forecast numbers of film viewers, it’s possible to build interest from people in the film industry to help raise initial investment funds which consequently can contribute to the wider economic development of the Korean film industry.
Before this contest, only a small number of people could access and analyze this type of big data, so this contest also raised people’s awareness of the importance of big data which will eventually help to build a better big data ecosystem across public and private sectors.
Q: Do you have plans for the next contest?
A: Yes, we will continue to organize a Big Contest each year. In Big Contest 2015, for the Futures League we plan to provide a problem on predicting winning rates of sport games, just like the film 'Moneyball' or Google’s predictions during World Cup 2014.
For the Challenge League, we will use the same challenge question as for the Futures league- predicting winning rates of sport games – so that participants can upgrade and refine prediction models. We will ask winners to open up their analytical and prediction models, for others to reference and learn from.
Q: How many of registrations and submissions did you get? Were there any that surprised or inspired you? What feedback have you received about the Contest from participants?
We had 386 teams register initially, but in the end, 88 teams (50 and 38 from Future’s and Challenge Leagues, respectively) submitted entries. Overall, the results were high quality but in particular, “Kim’s Diner team” which won the Grand Prize at the Challenge League really exceeded expectations. Instead of using the existing tools presented by the host, the team made new forecasting models, and developed them to the extent that they could be used in an actual working environment; which really underlined the value of this type of challenge.
The overall response was good and many participants had great experiences exploring different kinds of data mining and analytics methods, which they hadn’t previously had the chance to apply to “real” big data – since most of their prior experiences were theoretical. Also, some participants said that it was a challenge to manipulate and process such big data sets but basically they enjoyed the intellectual challenge.
Q: How do private-sector companies benefit from providing data in this contest?
Apart from the main purpose of raising awareness of the benefits of using big data, we think there are at least three advantages to participating for private sector companies. First, they enhance their brand image by concentrating efforts around big data. Second, they can learn technical approaches from participants and their entries, for instance how to analyze big data and tackle problems. Third, the challenge provides opportunities to identify talented people who understand big data analytics.
Q: What did you learn through holding a big data challenge?
One of the lessons learned is that some companies are reluctant to share and contribute their big data to this contest, mostly because they are concerned about issues cropping up that they can't foresee.
We also learned that the Big Contest works better for everybody if we can provide more, diverse types of big data, so as we continue in 2015 and beyond we will continue to discuss access to data with more private sector companies, seek to understand their concerns and engage more deeply with them.