Predicting Stock Price Movements with Text Data using Labeling based on Financial Theory

Fredrik Ahnve, Kasper Fantenberg, Gustav Svensson, Daniel Hardt*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review


We apply Natural Language Processing and supervised Machine Learning to predict stock price movements, based on approximately 30,000 ad-hoc disclosures issued by publicly traded companies on Nasdaq OMX Stockholm. Three different labeling methods, based on financial theory, are defined and assessed. The best results, using Logistic Regression and TF-IDF with character-grams, achieve an increase of 6,3 percentage points above a majority class baseline. These results show that corporate ad-hoc disclosures, which are regulated to represent novel and value-relevant information, are particularly well-suited for this task. Furthermore, the most sophisticated labeling technique used, Jensen’s Alpha in the context of the Capital Asset Pricing Model, helps the model achieve its highest accuracy. The results therefore show that financial theory can help isolate the effect of an informational event on stock prices, improving the supervised Machine Learning approach. Finally, an algorithmic trading strategy is simulated with the best model, yielding positive abnormal returns.
Original languageEnglish
Title of host publicationProceedings - 2020 IEEE International Conference on Big Data. Big Data 2020
EditorsXintao Wu, Chris Jermaine, Li Xiong, Xiaohua Hu, Olivera Kotevska, Siyuan Lu, Weija Xu, Srinivas Aluru, Chengxiang Zhai, Eyhab Al-Masri, Zhiyuan Chen, Jeff Saltz
Number of pages8
Place of PublicationLos Alamitos, CA
Publication date2020
Article number9378054
ISBN (Print)9781728162522
ISBN (Electronic)9781728162515
Publication statusPublished - 2020
EventEighth IEEE International Conference on Big Data. IEEE BigData 2020 - Virtual Event
Duration: 10 Dec 202013 Dec 2020
Conference number: 8


ConferenceEighth IEEE International Conference on Big Data. IEEE BigData 2020
LocationVirtual Event
Internet address


  • Algorithmic trading
  • Machine learning
  • Stock price prediction
  • Ad-hoc disclosures
  • Natural language processing
  • Text mining
  • Finance

Cite this