Following up on yesterday’s post on Platform Requirements for Global Pulse, I wanted to talk about our Open Architecture strategy. A few observations:
- We really don’t want to build this platform from scratch.
- There are great tools out there that we could integrate to get us quite far along toward an end-to-end implementation. If we come up with some sensible, workable APIs, many existing tools could all be wired up as data providers, mapping components, analytical services, etc.
- Some components we’ll probably have to implement from scratch to fill in the gaps where no existing technology can be easily modified to address requirements. If we’re lucky, there will be relatively few of these.
- The only way this platform will ever become sustainable is for us to grow an ecosystem of parties that both receive value from it and add value to it.
- While everything we build ourselves will be free and open source, we neither expect nor require that all components of a given end-to-end implementation would be open source. It’s the open standards support that is key. If the private-sector Big Dogs want to go head-to-head to offer the best proprietary visual analytics plug-in for Global Pulse, we won’t get in their way.
- The flipside of this scenario, however, is that we need to ensure that we do end up with a complete, end-to-end implementation based exclusively on free and open source software, so that no government or other organization wishing to deploy the platform would incur licensing fees.
An open, standards-based reference architecture is clearly the way to go here. Before Pulse Camp wraps up on Dec. 3rd, I hope we’ll be able to identify some of the key building blocks and associated data standards within that architecture. Along the way, we’ll need to:
- Set the core values and principles behind the ICT aspects of Global Pulse;
- Start establishing a shared language and information model;
- Provide some concrete examples of what Global Pulse could become;
- Call out conceptually capabilities that may or may not exist already;
- Add clarity around some use case ‘slices’ that cut through the architecture and information model
We know that the success of our mission will depend on creating an open, dynamic architecture that simplifies how the community of practice innovates around Global Pulse as well as the missions of the practitioners involved. Here are a few of the architectural tenets we’d like to adhere to:
1. Open Data….with Governance
Global Pulse aims to provide value to different actors that have different requirements and desires around data openness and visibility. This places an interesting tension between the approach of purely betting on open data versus that of trying to create an enterprise-centric information exchange that would stifle innovative information flows. Proven patterns to support this principle include:
- Standalone services for authentication, authorization and auditing
- Pervasive awareness of information source, attribution, and license for re-use
- Federation of data including N-way data sharing
- Social graph-based visibility and sharing of data
2. Creating a Platform for Innovation
As a platform, Global Pulse has to allow unexpected innovative applications to emerge, co-exist and sometimes compete to create value to users. Creating a successful platform is not trivial, as it requires a balance between providing value out of the box for key stakeholders and supporting the creative contributions of a thriving ecosystem of parties. Patterns that support creating such a platform include:
- Documented APIs and data formats for all services
- Fostering applications that add value to information and pass it on, rather than being ‘dead ends’
- Use of documented microformat-based standards for information
- Not trying to provide a ‘portal’ or unique starting point for user experiences, rather fostering a multitude of sites that can be relevant go-to places for different actors
- Providing centrally hosted, trusted services for authoritative information (e.g. users and social graphs, indicators, etc.) that seed federated services and mashups.
3. Standards-Based Information Exchanges
Exchanging information based on standards is not just an implementation detail, it is a commitment to maintaining a shared understanding of what information means. Global Pulse’s domain model touches upon many topics that currently do not have standards, so it will have to foster their creation and evolution.
Successful patterns for evolving and using standards in a complex space include:
- Create a community of practice with live dialogues around key standards
- Use existing standards when and where they are available
- Evolve new standards based on actual implementations ‘in vivo’ rather than ‘in vitro’.
- Create micro-formats and well-scoped data elements that can be composed, rather than large document exchange definitions
- Keep a pragmatic view, as standards evolve in a context and any given ISO recommendation is not necessarily what Global Pulse should bet on without being informed by implementation
- Build upon lower-level standards. For example, XML already has specifications for encoding 16-bit character sets, internationalized data elements with explicit languages, etc. Re-inventing naively these fundamental aspects of information exchange can lead to confusion, incompatibility, wasted effort and lack of credibility down the line.
Examples of standards that Global Pulse would inherently benefit from are:
- Data format and ontology standards such as RDF, OWL, and XForms
- Payload-agnostic syndication and Federation standards such as RSS, ATOM
- Domain-specific metadata formats such as GeoRSS, KML, iCal, and FOAF
- Domain-specific ontologies and schemas such as ICD9/10 and CAP
- Indicator-sharing standards such as SDMX-HD
- Federated security standards such as OpenID, OAuth, and SAML
4. Global Access
Global Pulse ultimately wants its services to be accessible to vulnerable populations. Even through indirect channels such as working with local media, strong consideration has to be given to language, culture and ICT appropriateness. The following are examples of what the Global Pulse platform elements and applications could expect to support to enable global access:
- Ubiquitous, concurrent support for multiple languages and character sets
- Open APIs that allow innovative new systems fill in the gap of the ‘first/last mile’ such as
- Services to allow voice interaction
- Services to simplify reporting information across the literacy gap
- Information services that disseminate risk information either on-demand or by push
- Services to search, update and share geospatial information via low-end devices
- Interfacing with a dynamic portfolio of machine- or crowd-based translation services
- Interfacing with a dynamic portfolio of geocoding, reverse-geocoding and gazeteer services
- Treating image, voice and video payloads as first-class citizens
- Support of tools that can scale up as well as out
- Encouraging (albeit not mandating) open-source and locally-sustainable engineering
- Sharing of common design practices around occasional connectivity and data synchronization
- Access to relevant geospatial data on low-end devices
Infrastructural Considerations
We would expect to have a “global” instance of the software running in a UN cloud where anyone could create a workspace, upload some data, and start working with their colleagues. Yet in such cases, data would be stored in the UN cloud, and that won’t work for Member State governments, as much of this data may come with associated privacy, national security, and intellectual property issues. It’s probably safe to assume that many organizations will want to run their own instances of the software behind their own firewalls. Yet they will still want to share information with other users in other workspaces on different infrastructure. Anyone should be able to set up an instance of the software on physically secure infrastructure and share select information with the broader network.
A Semi-Federated Architecture?
The Global Pulse architecture would probably benefit from separating data and services into tiers (non-strict) that can be combined, mashed up and reused to build specific applications. Applications and services in the same tiers might share some characteristics:
- Domain Data Stores: Saves core data of the Global Pulse information model. Should support federation.
- Core APIs: Provide access to information in a secure way with enough metadata for federation and support for occasionally connectivity. Might include APIs for data access, federation and sync.
- Infrastructure Data Stores: Store metadata that crosscuts all Global Pulse systems. Federation of this data is probably not needed and maybe even a liability.
- Infrastructure Applications and Services: Provide maintenance and utility access to the information in the infrastructure data stores. Might include user/identity management, permissions, community/social data, event logging, collaborative ontology development, data provider management, and usage statistics.
- Core Applications: The minimal set of applications that Global Pulse would need to support in order to fulfill the expectations of key stakeholders. These applications use the Core APIs and Infrastructure APIs. Here we might have various dashboards, collaboration and annotation tools, hypothesis management tools, machine learning services, and the core analytical toolkit.
- Community Applications: The dynamic ecosystem of applications built by the external community with Global Pulse’s support that add value to the information and provide tailored experiences for different types of actors, scenarios, or data processing needs. Might include various toolkits for different kinds of organizations, tools for mobiles, visualization tools, mapping applications, data curation tools, social media integration, alerting tools, and a variety of analytical tools.