Abstract
The Information Systems (IS) field is increasingly engaging in computationally intensive research (Berente et al., 2019; Miranda et al., 2022). The basis for these projects is often formed by digital trace data often coming using web scraping techniques to collect data from online environments like social media or websites (Boegershausen et al., 2022; Miranda et al., 2022). This ever-increasing data treasure offers unparalleled opportunities for researchers. Yet, we know little on the crucial link between web scraped digital trace data and subsequent computationally intensive theory development. Researchers can make use of web scraping (web crawlers/web spiders) and/or application programming interfaces (API) to automatically collect information from websites. However, the utilization of web scraping is highly sensitive in terms of practices (Boegershausen et al., 2022) and little is known about the subsequent impact on theory development. Whereas in the past, the IS field was at the forefront of discussing the use of data stemming from online environments (Allen et al., 2006), not much research has been added to that. Thereby, questions of on how to systematically use web scraping and the potential consequences of choices during the scraping process on theory development remain unanswered.
We sat out to understand current web scraping practices in the IS field. Therefore, we collected and analyzed 176 papers from the leading four IS journals. Our exploratory approach yielded challenging findings. Among our findings, we see that the practices of web scraping are only vaguely described, limiting potential replications. We also see that ethical challenges and data rights are barely covered. Additionally, our study revealed a strongly skewed distribution of only a handful or data sources (e.g., Twitter, Amazon) representing the overwhelming mass of publications. As such, we see grounds for discussing these choices, offering guidelines, and contributing to the literature on computationally intensive theory development.
We sat out to understand current web scraping practices in the IS field. Therefore, we collected and analyzed 176 papers from the leading four IS journals. Our exploratory approach yielded challenging findings. Among our findings, we see that the practices of web scraping are only vaguely described, limiting potential replications. We also see that ethical challenges and data rights are barely covered. Additionally, our study revealed a strongly skewed distribution of only a handful or data sources (e.g., Twitter, Amazon) representing the overwhelming mass of publications. As such, we see grounds for discussing these choices, offering guidelines, and contributing to the literature on computationally intensive theory development.
Original language | English |
---|---|
Title of host publication | ICIS 2024 TREOS |
Editors | Hope Koch, Peter Ractham, Heinz-Theo Wagner |
Number of pages | 1 |
Place of Publication | Atlanta, GA |
Publisher | Association for Information Systems. AIS Electronic Library (AISeL) |
Publication date | 2024 |
Article number | 85 |
Publication status | Published - 2024 |
Event | The 45th International Conference on Information Systems. ICIS 2024: Digital Platforms for Emerging Societies - Bangkok Marriott Marquis Queen’s Park, Bangkok, Thailand Duration: 15 Dec 2024 → 18 Dec 2024 Conference number: 45 https://icis2024.aisconferences.org/ |
Conference
Conference | The 45th International Conference on Information Systems. ICIS 2024 |
---|---|
Number | 45 |
Location | Bangkok Marriott Marquis Queen’s Park |
Country/Territory | Thailand |
City | Bangkok |
Period | 15/12/2024 → 18/12/2024 |
Internet address |