Emoticon Use in Arabic, Spanish and English Tweets
Most of our online media projects have some version of sentiment analysis in them. The problem is, of course, that extracting sentiment programmatically from language is a notoriously hard process. It’s even hard for humans to do.
Often we opt to use search keywords (the taxonomy) to group content into two or more groups that are relevant to the problem at hand. That could be for vs. against a campaign message, discriminatory vs. anti-discriminatory or similar (compare that to the postive and negative sentiment used by most automatic classifiers). That said, we also tend to do sentiment analysis in it's purest form: looking for words with positive or negative sentiment such as good, happy, bad etc. But what about negative phrases such as not good or not bad? And what about trickier cases like the phrase I don’t consider it good? Or the vast levels of sarcasm used on social media?
It’s possible to use other approaches such as crowdsourcing; Amazon Mechanical Turk has tagging of tweets as positive or negative as one of its custom tasks. Or you can train a classification algorithm to learn from a small sample of tagged tweets to train new ones (a great place to learn more is this classic primer on sentiment analysis).
In the spirit of Occam’s razor, one of the simplest and most durable solutions is to count emojis and/or emoticons. Sarcasm is used much more rarely with these symbols, and their meanings are not nearly as complex as natural language.
Inspired by this, we did a small experiment on the sentiments expressed using emoticons on a sample of tweets in three different languages; English, Arabic, and Spanish.
Here's English language tweets using positive emoticons:
And here's English language tweets using negative emoticons:
Generally people use more positive emoticons than negative emoticons, depending on the topic. The variety of different ways to express positive feelings is much greater than for negative feelings; there are subtle differences between :-) and ;) and \o/.
Let's look at a selection of Arabic tweets:
The Arabosphere also seems to be generally optimistic, using positive emoticons more often than negative ones. Interestingly, despite the Arabic language writing in the opposite direction of European languages (right-to-left), the most-used emoticons face left-to-right rather than being rotated to follow the direction of the text. However, the second most popular emoticons do rotate to face the direction of the text, unlike in English or Spanish.
In order to compare how different emoticons are used in each language, we compared the proportion of tweets using each emoticon (generally the rate of usage of emoticons is only about 1% of all tweets). In particular, we compared how much more often each emoticon is used in Arabic and Spanish compared to English. Whenever the dark blue or red lines go above the light blue line, that means that emoticon is used more than in English. So for example, (: is used in Arabic approximately 30 times as often as in English, whereas ;) is used much less frequently.
Likewise, we can look at negative emoticons and how much more they are used in each language. It appears that negative emoticons are used much less frequently in English than both Arabic and Spanish.