Abstract
Much research has been done within the social sciences on the interpretation and influence of stigma on human behaviour and health, which result in out-of-group exclusion, distancing, cognitive separation, status loss, discrimination, in-group pressure, and often lead to disengagement, non-adherence to treatment plan, and prescriptions by the doctor. However, little work has been conducted on computational identification of stigma in general and in social media discourse in particular. In this paper, we develop the annotation scheme and improve the annotation process for stigma identification, which can be applied to other health-care domains. The data from pro-vaccination and anti-vaccination discussion groups are annotated by trained annotators who have professional background in social science and health-care studies, therefore the group can be considered experts on the subject in comparison to non-expert crowd. Amazon MTurk annotators is another group of annotator with no knowledge on their education background, they are initially treated as non-expert crowd on the subject matter of stigma. We analyze the annotations with visualisation techniques, features from LIWC (Linguistic Inquiry and Word Count) list and make prediction based on bi-grams with traditional and deep learning models. Data augmentation method and application of CNN show high performance accuracy in comparison to other models. Success of the rigorous annotation process on identifying stigma is reconfirmed by achieving high prediction rate with CNN.
Original language | English |
---|---|
Title of host publication | Proceedings of The 12th Language Resources and Evaluation Conference (LREC 2020) |
Editors | Nicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis |
Number of pages | 13 |
Place of Publication | Paris |
Publisher | European Language Resources Association |
Publication date | 2020 |
Pages | 1178-1190 |
ISBN (Electronic) | 9791095546344 |
Publication status | Published - 2020 |
Event | The 12th Language Resources and Evaluation Conference. LREC 2020 - Marseille, France Duration: 11 May 2020 → 16 May 2020 Conference number: 12 https://lrec2020.lrec-conf.org/en/ |
Conference
Conference | The 12th Language Resources and Evaluation Conference. LREC 2020 |
---|---|
Number | 12 |
Country/Territory | France |
City | Marseille |
Period | 11/05/2020 → 16/05/2020 |
Internet address |
Keywords
- Annoration process
- Stigma annotation scheme
- Social media
- CNN
- N-grams