Pulse Lab Jakarta together with Jakarta Smart City recently participated in the University of Chicago’s Center for Data Science and Public Policy annual Data Science for Social Good (DSSG) fellowship, a summer programme training aspiring data scientists to work with government and non-profit partners on innovative projects with social impact. Our project, which proposed analysing CCTV data in Jakarta for the purpose of improving traffic safety, was selected as one of the global challenges that the fellows took on for their three-month programme.
The problem
Jakarta Smart City, which was set up by the Jakarta provincial government in 2015, works on technology-based services for residents. One of the main problems in Jakarta is the city’s notoriously congested roads with the numbers of cars and motorcycles rising annually. This contributes to traffic congestion and adds burden to the city’s infrastructure that was not designed to accommodate such numbers. Jakarta Smart City, in collaboration with Pulse Lab Jakarta, sought to improve traffic safety by harnessing data gleaned from raw, closed-circuit television video (CCTV) footage positioned at various intersections throughout the city. While the Jakarta city government maintains these cameras, the amount of footage is too voluminous for manual monitoring.
Enter the DSSG Fellows
A collaboration with DSSG fellows was a fantastic opportunity to produce some degree of automation for the city’s CCTV network in order to encourage effective and efficient resource allocation. The objective was to tap into the smarts of the data scientists contributing their time and experience, using the data made available by Jakarta Smart City to come up with a modelled system to improve traffic safety and resource allocation strategies around road/traffic safety. In particular, the collaboration aimed to build a video-processing pipeline to extract structured information from raw traffic video footage.
The process
The project deliverable was defined as a video processing pipeline that could automatically receive and process CCTV footage and then create structured output suitable for downstream applications (potentially integrating non-video data sources such as traffic data or weather data). Clear project milestones and timelines were worked out and agreed to and regular communication channels put in place, including Slack and weekly check in calls with all partners.
How it worked
Jakarta Smart City provided domain expertise and in-depth knowledge of how planning applications and decisions are facilitated so that the DSSG team could understand the context in which decisions are being made. The project relied on deep learning to identify objects in images — a task that humans can do well but one that is labour-intensive and hard to scale, making computer vision a more efficient approach. This initiative was essentially about converting unstructured video data into structured traffic data that could then be used for identification purposes — object detection, classification and descriptions.
The project involved four main tasks: object detection, object classification, motion detection and semantic segmentation. Object detection is the spotting of various objects in a given video frame, while object classification is aimed at accurately categorising the objects identified. For these two tasks, the YOLO3 model was used, whereby a rectangular box is placed around objects in the video frame and a list of possible categories for the object is given; for instance, car, motorbike, truck, etc.

The motion detection task relied on the Lucas Kanade Sparse Optical Flow Method to calculate optical flow (which in simple terms is the pattern of moving objects between two consecutive video frames caused by the movement of an object itself or the camera). Lastly, semantic segmentation was deployed through a combination of WideResNet38 and the DeepLab3methods which helped in separating surfaces such as roads and sidewalks within the video frame. These four tasks helped in the realisation of a pipeline that converts raw, unstructured video frames into data that is ready for analysis.
Limitations
Due to time constraints, the team has not yet trained or tested other object detection and classification models. This resulted in certain limitations such as the inability of the current model to correctly classify bajaj motorcycles (a common mode of transportation in the city), thus omitting bajaj from the final count. However, to balance such shortcomings, the team endeavoured to include as many tools as possible to aid in the collection of data, as well as the validation and training of the process in the future. This deliverable included detailed instructions on how to use the Computer Vision Annotation Tool (CVAT) along with much of the code required to finetune and run a model once it has been trained.
Where do we go from here?
Pulse Lab Jakarta is grateful to the fantastic team at DSSG for their support and enthusiasm for this project. Working across continents, many time zones and in different languages was no easy task, but with a bit of perseverance and patience the project generated the first iteration of the video analysis pipeline.
The 2018 DSSG programme consisted of 24 aspiring data scientists in Chicago, US and 15 other data scientists from across the world convened in Lisbon, Portugal.
The full technical report is available at: http://bit.ly/2xT6vfK.