Stigma Annotation Scheme and Stigmatized Language Detection in Health-Care Discussions on Social Media

Nadiya Straton, Hyeju Jang, Raymond Ng

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningpeer review

10 Downloads (Pure)


Much research has been done within the social sciences on the interpretation and influence of stigma on human behaviour and health, which result in out-of-group exclusion, distancing, cognitive separation, status loss, discrimination, in-group pressure, and often lead to disengagement, non-adherence to treatment plan, and prescriptions by the doctor. However, little work has been conducted on computational identification of stigma in general and in social media discourse in particular. In this paper, we develop the annotation scheme and improve the annotation process for stigma identification, which can be applied to other health-care domains. The data from pro-vaccination and anti-vaccination discussion groups are annotated by trained annotators who have professional background in social science and health-care studies, therefore the group can be considered experts on the subject in comparison to non-expert crowd. Amazon MTurk annotators is another group of annotator with no knowledge on their education background, they are initially treated as non-expert crowd on the subject matter of stigma. We analyze the annotations with visualisation techniques, features from LIWC (Linguistic Inquiry and Word Count) list and make prediction based on bi-grams with traditional and deep learning models. Data augmentation method and application of CNN show high performance accuracy in comparison to other models. Success of the rigorous annotation process on identifying stigma is reconfirmed by achieving high prediction rate with CNN.
TitelProceedings of The 12th Language Resources and Evaluation Conference (LREC 2020)
RedaktørerNicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Antal sider13
ForlagEuropean Language Resources Association
ISBN (Elektronisk)9791095546344
StatusUdgivet - 2020
BegivenhedThe 12th Language Resources and Evaluation Conference. LREC 2020 - Marseille, Frankrig
Varighed: 11 maj 202016 maj 2020
Konferencens nummer: 12


KonferenceThe 12th Language Resources and Evaluation Conference. LREC 2020


  • Annotation process
  • Stigma annotation scheme
  • Social media
  • CNN
  • N-grams