A blog post by our radio data science team at Pulse Lab Kampala.
Keyword Spotter, Pulse Lab Kampala’s New Speech Recognition and Filtering Software
The year 2017 was an exciting one for many of us, and also for Pulse Lab Kampala! Not only did we have a lot of interesting projects going on, but we also developed the Keyword Spotter, Spock’s (our previous radio mining application) little brother so to say. This new toolkit is faster and the technology package is easier to use for new languages.
This was an important milestone for Pulse Lab Kampala’s Radio Content Analysis Tool, our most innovative application that allows mining of public radio conversations in Uganda. This information can then be used to inform polices or programmes, and/or identify misconceptions and information gaps that can be addressed through public sensitization initiatives.
How does it work? First Mission: Identifying Key Words
To train the keyword spotter in a new language, the first step is to collect what is called “training data.” Training data is used for the machine to learn a new language. We sat with our partners who wanted to use the tool to support their work. Together we identified about 30 keywords related to specific topics of interest, namely ‘health’ and ‘BTVET (Business, Technical, Vocational Education and Training)’. These keywords were then to be translated by mother tongue speakers and recorded as training data for the speech recognition tool. We estimated that at least 50 different voices were needed to obtain a diverse enough sample of keyword sound files and allow the machine to learn how to recognize the basic underlying sound pattern.
The plan for these recordings was quite simple and straightforward: travel hundreds of kilometers, meet 24+ different local people from each region, explain the tool to them, ask for their permission to record their voice, and record. However, during our first field mission we quickly realized that this was not as easy as it sounded.
First Mission: Not so Easy
We learned a number of important lessons from this first mission:
1.Translate all keywords in the local language first. People often translated the same English word differently, depending on their interpretation. The word “youths” for example, was sometimes translated as children (anzi nyri), sometimes as youth (ba ode), and sometimes as young males (karile). These different words evidently do not have the same basic sound patterns and would not provide us with proper training data. Providing a standard list with the keywords in Luganda and Acholi to be read out loud was the key for obtaining good training data.
2. Gaining trust is important. If people are not entirely sure about the intent of recording their voices, they can get suspicious. On one occasion, people suspected our staff of wanting to use the recordings for witchcraft.
3. Find a good place to record. Although the towns we visited weren’t big, the streets can be quite busy with trucks and boda bodas (motorcycle taxis) rattling around. Although this is at times difficult, it is important to find quiet places for good quality sound recordings.
4. Ask people to utter the keywords as naturally as they can. People often tried to pronounce each keyword very slowly and carefully, in an attempt to sound clear enough. This is great when trying to learn a new language, but not useful when trying to train a machine to pick up sounds from natural conversations taking place on radio.
Second Mission: Building the Networks
Based on our first experience and these key lessons, we decided to adopt a different approach for our second mission. Together with the radio analysts, the team came up with an idea to use our networks and relations, to convey people’s trust and time to help us with the keyword recordings. One of our radio analysts, for example, had previously interned with a research association in the Rwenzori region, and still knew a number of people there. Because of the conection, people felt comfortable and agreed to help with the recordings. In turn, they also asked their friends and family to help us.
In the two weeks, spent equally in Arua and Fort Portal, the team was able to record the voices of more than twice the targeted number of people from different backgrounds and ages, men and women, boys and girls, and different pitches.
During our last mission, we found the voices of Ugandan people we were looking for. This gave us the basic training data to develop the first Lugbara and Rutoro language models. However, in data science and machine learning, more training data is almost always better. Encouraged by our success, we set the bar higher and aimed at collecting at least 75 speech training samples per keyword to make the tool more accurate and efficient. In our next blog, we will explain how we obtained them and how we improved our Keyword Spotter with additional training data gathered in a way that was never tried before. As the keyword spotter is becoming functional and used by our partners, together we get closer to informing policies for more effective service delivery in the health and education sectors of Uganda.