The definition of hate speech was based on academic research in the social sciences. The definition was done by producing hate speech categories, and then used to manually identify examples of hate speech in a data set of online messages. These annotations were then used as training data for Utopia AI Moderator, a language-independent tool that utilizes text analytics and machine learning. The data set was 12 million Finnish comments and posts from September to October 2020.
The results show that about 150 000 messages that contain hate speech appear on publicly available Finnish social media platforms every month. That’s about 1.8% of all messages.
Among the public international social media platforms, Twitter seems the most prominent, with 7 450 messages identified as hate speech, or 0.14% of all tweets. Retweets play a significant role in circulating these messages: 39% of all hate-speech tweets are duplicates.
“While the data set consisted of mostly Finnish messages,” says Utopia’s CEO Dr. Mari-Sanna Paukkeri, “the results would be very similar in other languages . For example, the major platform for Finnish hate speech, Ylilauta, is a peer to the commonly known 4chan. Moreover, we can build a similar AI model to identify hate speech in any language in only two weeks. We only need a skilled individual to say how hate speech should be defined in your culture and language and we need the data to analyse.”
Media library/photos: https://utopiaanalytics.com/media-library