As we continued on day three to narrow the discussion from our “user personae” to actual details of the platform, the discussion got ever more technical. We also had two excellent presentations about the future landscape of “new data” and some current uses of SMS for data collection and citizen engagement in Uganda.
Piers Fawkes and Jeff Squires from PSFK gave an overview of emerging trends in real-time technology. They emphasized the “data trails” that are available from a variety of sources, and innovative ways of capturing those trails. This includes much of what has been at the heart of Global Pulse thus far, ie. “online buzz as an indicator of offline status” but also new ways of collecting information that is already taking place, a “central nervous system for the planet.” This includes monitoring water flows in developing countries and initiatives like imbedding sensors underneath streets to detect where cars are parked. Also noted is the growing trend towards open data, increasing institutional transparency and accountability with open policies As has been emphasized throughout, these new streams revolve incorporating temporal and spatial elements as central elements of our analysis.
There was also an excellent presentation from UNICEF Uganda, detailing three uses of cell phones that UNICEF is currently deploying in Uganda: (1) improving service delivery; (2) community vulnerability surveillance; and (3) Citizen engagement. There are challenges with each of these, including data quality and—where the quality of the data is high—the capacity to really maximize the use of that information. Questions around this concentrated on practical details, such as how texts are paid for (there’s a toll free short code), language requirements (currently, it’s all in English, but they hope to be able to text in a variety of different languages soon), and how information is reported back to the communities (currently, by paper).
The majority of the day, however, was spent in autonomous break out sessions around evolving the technology requirements for the Global Pulse technology.
I participated in a discussion with participants from UNU Wider and UNDP about the analytical framework for the Global Pulse.
This workshop brought up several interesting questions for analysis. Within the Global Pulse team, we have had long conversations about the approach of Global Pulse. Do we look at specific crises? Do we predefine a specific set of indicators that we wish to capture? One of the things that became clear from this conference is that we would be doing a disservice to “the possible” if we are too rigid in defining our analytical approach. The streams of information that we will be getting are at the very beginning stages of being incorporated into analysis, and we will probably be receiving streams of information that we haven’t even conceived of yet. However, it also became clear that for the vast expanse of data that is out there to be of any use at all, we need some framework to be working from. Getting this balance right—to make proper use of the data without inhibiting data streams—will be central to our work with the Labs and in New York.
I joined the rest of the group for the summary of the technical discussions, which had been split into three categories.
Workspace
The workspace in essence is how people will interact within the system. As discussed yesterday in relation to the social data, this should be designed to encourage collaboration, and is focused on the experience of the analyst. One of the key outputs from this discussion was the idea of hypothesis sharing or collaboration network or “hunch lab”, where analysts can post unverified information for consideration by a relevant community of practice. In a later session, the hunch lab was expanded on considerably.
In short, if an analyst notices a correlation that s/he thinks might be of significance, there would be a space to post a hypothesis and receive feedback in the form of comments and data which could support or challenge that hypothesis. There was much discussion of the exact functionalities of the hunch lab, but it would basically work as follows:
Rhada in Bangalore notices from a few tweets that people might have stopped buying medications in rural areas because food prices have gone up. She posts her hunch in a hunch lab, and a colleague from WFP who is currently doing a survey on food security posts a response that he will add two questions to his survey to get more information. A local pharmacist posts his supply data which supports Rhada’s thesis. In about a week, the work space has come together and provided the range of evidence needed to indicate that Rhada was right. Local policy makers who have been monitoring the discussion begin to design a program around the issue.
A key element of this feature is to encourage the linking or merging of similar hypotheses, so that researchers working on similar hypotheses can collaborate.
System Architecture
This dealt with the way that information would be stored, where it would be stored, and how it would be shared. The focus of this discussion was a node-based network or mesh, allowing weaker or stronger information flows based on communities of trust. Each node would be an instance of Global Pulse platform and any node can publish an application or an adapter.
Data
Incoming and outbound data was another discussion. In particular, how will this data be shared, particularly among those who may not have access to the platform? Some people may be interested in “dirty data” which is unverifiable, or not statistically sounds, while some people may be interested only in clean datasets. The Global Pulse needs to be able to accept both.
The afternoon was taken up by “Open Spaces” in which topics that participants felt were important were the subjects of group discussions. I was in the group on data privacy. We divided the issue into two main categories: (1) the data itself and (2) who sets the privacy settings.
Individual level data is an extremely sensitive area. All of this data will need to be anonymized; however, in order to be able to disaggregate the data in a way that will be useful for analysis, some detail will be required, including things like age, profession, location, etc. In the best of scenarios, this information will be critical to design programs to best provide assistance to those who need it. In the worst of circumstances, that information could make people targets.
Private sector data has a very different set of issue, particularly relating to incentives. We would like the private sector to share their data, but it needs to be done in such a way that it does not threaten the company, in particular providing detailed market information to competitors.
Following from this, the question of who sets the privacy settings is critical. Does each “node” set its own settings? Does each piece of data have a different privacy setting? Do we have minimum standards of confidentiality defined from New York?
This is a critical area for Global Pulse to explore, and we are currently gathering our forces forces for further work on this. Anyone out there reading this blog, feel free to suggest further readings or other resources for data privacy!
Take-away points
I’ll let the technical people comment on the technical challenges, in particular those which will be part of the Random Hacks of Kindness Hack-a-thon. Several interesting points for me came up, however:
First, the programs that we design from New York presuppose being “hooked up.” Those directly accessing the system are not those who this system is meant to serve—“the most vulnerable”. However, the active participation of the average citizen in developing countries, both as sources and recipients of data, is imperative to the success of the system. The involvement of the Pulse Labs is key here. Garnering the “information exhaust” from those on the other side of the digital divide is a major challenge.
Second, and following from this, our decentralized approach will be fantastic for local and national level analysis. To identify global trends, however, we will have to give more thought to the question of data standards and comparability, and how the information we are pulling from disparate communities can be looked at together.
And finally, the relationship between the technical community and the analytical community needs to be developed and fostered throughout. The technical community needs guidance on the requirements of the analytical community, and the analytical community needs to be consistently challenged by the technical community to incorporate new sources of data into analysis. Both the technical tools and the analytical tools will have to evolve over time.
Overall, an extremely productive and stimulating three days!