TY - CHAP
T1 - How Can a Machine Learning-based LDA Model Help in Literature Search in Systematic Literature Review?
AU - Akagić, Amila
AU - Kadic-Maglajlic, Selma
PY - 2024
Y1 - 2024
N2 - The systematic literature review (SLR) is an important method for summarizing previous research findings, and as such, it is relevant to both scholars and practitioners. The critical decision in a SLR is determining the keywords that define the articles to be analyzed further. Although the keywords are carefully selected, the SLR is highly biased due to a possible human error in the selection of keywords, which may lead to various omissions in further analysis. In addition, the number of articles published each year is increasing exponentially and studies are becoming more interdisciplinary, making it increasingly difficult to identify all relevant articles. In this study, we show how machine-learning algorithms can help identify relevant articles by using the latent Dirichlet allocation (LDA) model. This model is based on an unsupervised machine-learning process that enables the identification of articles and topics based on the semantic similarity of the entire article (body) text rather than only keywords. In this study, we demonstrate the application of the LDA method on the COVID-19 Open Research Dataset (CORD-19) database of over 750,000 scientific articles. We describe the main features of the LDA method and provide step-by-step instructions so that readers without a technical background can understand the LDA process. Finally, we provide access to the model trained on the CORD-19 database that enables rapid identification of marketing and management research topics within the database, including a set of “do-it-yourself” options that can help non-technical readers in their initial exercises with LDA.
AB - The systematic literature review (SLR) is an important method for summarizing previous research findings, and as such, it is relevant to both scholars and practitioners. The critical decision in a SLR is determining the keywords that define the articles to be analyzed further. Although the keywords are carefully selected, the SLR is highly biased due to a possible human error in the selection of keywords, which may lead to various omissions in further analysis. In addition, the number of articles published each year is increasing exponentially and studies are becoming more interdisciplinary, making it increasingly difficult to identify all relevant articles. In this study, we show how machine-learning algorithms can help identify relevant articles by using the latent Dirichlet allocation (LDA) model. This model is based on an unsupervised machine-learning process that enables the identification of articles and topics based on the semantic similarity of the entire article (body) text rather than only keywords. In this study, we demonstrate the application of the LDA method on the COVID-19 Open Research Dataset (CORD-19) database of over 750,000 scientific articles. We describe the main features of the LDA method and provide step-by-step instructions so that readers without a technical background can understand the LDA process. Finally, we provide access to the model trained on the CORD-19 database that enables rapid identification of marketing and management research topics within the database, including a set of “do-it-yourself” options that can help non-technical readers in their initial exercises with LDA.
KW - Systematic literature review
KW - Machine learning
KW - Latent Dirichlet allocation
KW - LDA
KW - COVID-19 Open Research Dataset
KW - Systematic literature review
KW - Machine learning
KW - Latent Dirichlet allocation
KW - LDA
KW - COVID-19 Open Research Dataset
U2 - 10.4337/9781800888531.00020
DO - 10.4337/9781800888531.00020
M3 - Book chapter
SN - 9781800888524
T3 - How To Guides
SP - 190
EP - 210
BT - How to Achieve Societal Impact Through Engaged and Collaborative Scholarship
A2 - van der Borgh, Michel
A2 - Lindgreen, Adam
A2 - Schäfers, Tobias
PB - Edward Elgar Publishing
CY - Cheltenham
ER -