Predicting Stock Price Movements with Text Data using Labeling based on Financial Theory

Fredrik Ahnve, Kasper Fantenberg, Gustav Svensson, Daniel Hardt*

*Corresponding author af dette arbejde

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningpeer review

Abstract

We apply Natural Language Processing and supervised Machine Learning to predict stock price movements, based on approximately 30,000 ad-hoc disclosures issued by publicly traded companies on Nasdaq OMX Stockholm. Three different labeling methods, based on financial theory, are defined and assessed. The best results, using Logistic Regression and TF-IDF with character-grams, achieve an increase of 6,3 percentage points above a majority class baseline. These results show that corporate ad-hoc disclosures, which are regulated to represent novel and value-relevant information, are particularly well-suited for this task. Furthermore, the most sophisticated labeling technique used, Jensen’s Alpha in the context of the Capital Asset Pricing Model, helps the model achieve its highest accuracy. The results therefore show that financial theory can help isolate the effect of an informational event on stock prices, improving the supervised Machine Learning approach. Finally, an algorithmic trading strategy is simulated with the best model, yielding positive abnormal returns.
OriginalsprogEngelsk
TitelProceedings - 2020 IEEE International Conference on Big Data. Big Data 2020
RedaktørerXintao Wu, Chris Jermaine, Li Xiong, Xiaohua Hu, Olivera Kotevska, Siyuan Lu, Weija Xu, Srinivas Aluru, Chengxiang Zhai, Eyhab Al-Masri, Zhiyuan Chen, Jeff Saltz
Antal sider8
UdgivelsesstedLos Alamitos, CA
ForlagIEEE
Publikationsdato2020
Sider4365-4372
Artikelnummer9378054
ISBN (Trykt)9781728162522
ISBN (Elektronisk)9781728162515
DOI
StatusUdgivet - 2020
BegivenhedEighth IEEE International Conference on Big Data. IEEE BigData 2020 - Virtual Event
Varighed: 10 dec. 202013 dec. 2020
Konferencens nummer: 8
https://bigdataieee.org/BigData2020/

Konference

KonferenceEighth IEEE International Conference on Big Data. IEEE BigData 2020
Nummer8
LokationVirtual Event
Periode10/12/202013/12/2020
Internetadresse

Emneord

  • Algorithmic trading
  • Machine learning
  • Stock price prediction
  • Ad-hoc disclosures
  • Natural language processing
  • Text mining
  • Finance

Citationsformater