Stigma Annotation Scheme and Stigmatized Language Detection in Health-Care Discussions on Social Media

Nadiya Straton, Hyeju Jang, Raymond Ng

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

178 Downloads (Pure)

Abstract

Much research has been done within the social sciences on the interpretation and influence of stigma on human behaviour and health, which result in out-of-group exclusion, distancing, cognitive separation, status loss, discrimination, in-group pressure, and often lead to disengagement, non-adherence to treatment plan, and prescriptions by the doctor. However, little work has been conducted on computational identification of stigma in general and in social media discourse in particular. In this paper, we develop the annotation scheme and improve the annotation process for stigma identification, which can be applied to other health-care domains. The data from pro-vaccination and anti-vaccination discussion groups are annotated by trained annotators who have professional background in social science and health-care studies, therefore the group can be considered experts on the subject in comparison to non-expert crowd. Amazon MTurk annotators is another group of annotator with no knowledge on their education background, they are initially treated as non-expert crowd on the subject matter of stigma. We analyze the annotations with visualisation techniques, features from LIWC (Linguistic Inquiry and Word Count) list and make prediction based on bi-grams with traditional and deep learning models. Data augmentation method and application of CNN show high performance accuracy in comparison to other models. Success of the rigorous annotation process on identifying stigma is reconfirmed by achieving high prediction rate with CNN.
Original languageEnglish
Title of host publicationProceedings of The 12th Language Resources and Evaluation Conference (LREC 2020)
EditorsNicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Number of pages13
Place of PublicationParis
PublisherEuropean Language Resources Association
Publication date2020
Pages1178-1190
ISBN (Electronic)9791095546344
Publication statusPublished - 2020
EventThe 12th Language Resources and Evaluation Conference. LREC 2020 - Marseille, France
Duration: 11 May 202016 May 2020
Conference number: 12
https://lrec2020.lrec-conf.org/en/

Conference

ConferenceThe 12th Language Resources and Evaluation Conference. LREC 2020
Number12
Country/TerritoryFrance
CityMarseille
Period11/05/202016/05/2020
Internet address

Keywords

  • Annoration process
  • Stigma annotation scheme
  • Social media
  • CNN
  • N-grams

Cite this