How Can a Machine Learning-based LDA Model Help in Literature Search in Systematic Literature Review?

Research output: Chapter in Book/Report/Conference proceedingBook chapterResearchpeer-review

Abstract

The systematic literature review (SLR) is an important method for summarizing previous research findings, and as such, it is relevant to both scholars and practitioners. The critical decision in a SLR is determining the keywords that define the articles to be analyzed further. Although the keywords are carefully selected, the SLR is highly biased due to a possible human error in the selection of keywords, which may lead to various omissions in further analysis. In addition, the number of articles published each year is increasing exponentially and studies are becoming more interdisciplinary, making it increasingly difficult to identify all relevant articles. In this study, we show how machine-learning algorithms can help identify relevant articles by using the latent Dirichlet allocation (LDA) model. This model is based on an unsupervised machine-learning process that enables the identification of articles and topics based on the semantic similarity of the entire article (body) text rather than only keywords. In this study, we demonstrate the application of the LDA method on the COVID-19 Open Research Dataset (CORD-19) database of over 750,000 scientific articles. We describe the main features of the LDA method and provide step-by-step instructions so that readers without a technical background can understand the LDA process. Finally, we provide access to the model trained on the CORD-19 database that enables rapid identification of marketing and management research topics within the database, including a set of “do-it-yourself” options that can help non-technical readers in their initial exercises with LDA.
Original languageEnglish
Title of host publicationHow to Achieve Societal Impact Through Engaged and Collaborative Scholarship : A Guide to Purposeful Marketing Research
EditorsMichel van der Borgh, Adam Lindgreen, Tobias Schäfers
Number of pages21
Place of PublicationCheltenham
PublisherEdward Elgar Publishing
Publication date2024
Pages190-210
Chapter10
ISBN (Print)9781800888524
ISBN (Electronic)9781800888531
DOIs
Publication statusPublished - 2024
SeriesHow To Guides

Keywords

  • Systematic literature review
  • Machine learning
  • Latent Dirichlet allocation
  • LDA
  • COVID-19 Open Research Dataset

Cite this