The Power of Online Text Data: Leveraging Social Media & Machine Learning to Generate Insights and Make Decisions

Jamie Lee Jackson & Makenzie McGraw

Studenteropgave: Kandidatafhandlinger

Abstract

Our research’s primary question aims to examine how social media can be used to gain insight into price movement. Secondarily, its purpose was to scrutinize the methodological choices made by researchers within the areas of machine learning and text analytics and how they have applied their methods to comparable questions. This topic holds importance due to the growing access or availability of information online and our current ability to properly interpret and process this information. As we find in this research, text data found online can be immensely useful, but difficult to properly utilize. From the moment we began this paper, we realized the enormity of controversy that price analysis entails in relation to the stock market and its predictability. As researchers, we wanted to come to our own conclusion in terms of the plausibility of this task at hand. To inform upon these controversies we include the discussion of the efficient market hypothesis (EMH), random walk theory, adaptive markets theory, as well as other theories pertaining to behavioral finance. We found that a discussion of all three areas was necessary to gain insight into the possibilities and limitations of prediction within the stock market. Once these theories were established and discussed, we criticized and evaluated the latest works that have been conducted over the past twenty years in relation to stock market analysis and social media. Here we go over data used in the research as well the various methodological approaches taken. The many insights discovered in the previous works played an immense role in how we went about answering the primary question of this research. Once it was established how we would approach the collection and analysis of our data, we proceeded to the methodology. In this research we utilize data from both Twitter and Yahoo! Finance. We utilized existing datasets pertaining to Elon Musk’s tweets as well as financial news from 2010-2020. Additionally, our data included historical stock data for Tesla (TSLA) and the NASDAQ Composite Index (^IXIC) for the same years. Our methodology includes various combinations of these four data sets. We look at Elon’s tweets against both TSLA and ^IXIC, as well as the financial news data against both TSLA and ^IXIC. From there our methodology then employed a classification-based machine learning task where we used a logistic regression and a neural network. Overall, our results alluded to the impossibility of this task, at least with our resources. However, we did see significantly better performance with the combination of the financial news data and ^IXIC over any of the other combinations. Perhaps if an individual were to interpret information more efficiently this could in theory be possible for short term prediction. However, we do not possess the foresight to understand how the markets will behave in ten to twenty years from now.

UddannelserMSc in Business Administration and E-business, (Kandidatuddannelse) Afsluttende afhandling
SprogEngelsk
Udgivelsesdato2021
Antal sider122
VejledereLiana Razmerita