The study presents the first computational model of COVID vaccine stigma that can identify stigmatised sentiment with a high level of accuracy and generalises well across a number of social media platforms. The aim of the study is to understand the lexical features that are prevalent in COVID vaccine discourse and disputes between anti-vaccine and pro-vaccine groups. This should provide better insight for healthcare authorities, enabling them to better navigate those discussions. The study collected posts and their comments related to COVID vaccine sentiment in English, from Reddit, Twitter, and YouTube, for the period from April 2020 to March 2021. The labels used in the model, “stigma”, “not stigma”, and “undefined”, were collected from a smaller Facebook (Meta) dataset and successfully propagated into a larger dataset from Reddit, Twitter, and YouTube. The success of the propagation task and consequent classification is a result of state-of-the-art annotation scheme and annotated dataset. Deep learning and pre-trained word vector embedding significantly outperformed traditional algorithms, according to two-tailed P(T≤t) test and achieved F1 score of 0.794 on the classification task with three classes. Stigmatised text in COVID anti-vaccine discourse is characterised by high levels of subjectivity, negative sentiment, anxiety, anger, risk, and healthcare references. After the first half of 2020, anti-vaccination stigma sentiment appears often in comments to posts attempting to disprove COVID vaccine conspiracy theories. This is inconsonant with previous research findings, where anti-vaccine people stayed primarily within their own in-group discussions. This shift in the behaviour of the anti-vaccine movement from affirming climates to ones with opposing opinions will be discussed and elaborated further in the study.
Bibliografisk notePublished online 07 December 2022.
- Deep learning
- Social media