Abstract
The rise of hate speech on social media is a major cultural threat, as social media platforms are being used to shape the opinions of many people. With an increase of anti-refugee rhetoric in various parts of the world, online hate speech against refugees is becoming a cause of concern for the United Nations (UN), as it has been directly linked to acts of violence and atrocity crimes. Yet, very little research has been conducted regarding detection and analysis of hate speech in the context of refugees. Therefore, this thesis aims at answering the research question "How can hate speech against refugees on social media platforms be detected and measured using Natural Language Processing methods?". This work shows that deep learning models can be used successfully to classify hate speech on social media in the context of international refugee crises by relying on transformer-based architectures, leveraging a combination of 12 annotated hate speech datasets from various contexts. The best performing model achieves a macro F1-score of 81.0% on in-domain test data and an accuracy of 73.5% on a refugee-related tweets dataset. Moreover, the model exhibits a solid performance on HateCheck, a suite of functional test for online hate speech, especially for targeted groups such as refugees and immigrants. By applying the best performing model to English Twitter posts, hate speech levels between 2% and 50% were measured of the datasets surrounding five international refugee crises. Tokens such as "refugees", "refugee", "igrants", and "immigrants" from the refugee-related data were among the most influential features when predicting the hate speech class. All the results were validated by UNHCR, the UN refugee agency, and lay the foundation for creating a comprehensive system to measure, monitor, and analyze hate speech against refugees as part of the UN strategy and plan of action on hate speech.
Uddannelser | MSc in Business Administration and Data Science, (Kandidatuddannelse) Afsluttende afhandling |
---|---|
Sprog | Engelsk |
Udgivelsesdato | 2022 |
Antal sider | 141 |
Vejledere | Raghava Rao Mukkamala & Sippo Rossi |