Predicting Stock Price Movements with Text Data using Labeling based on Financial Theory

Fredrik Ahnve, Kasper Fantenberg, Gustav Svensson, Daniel Hardt

Research output: Contribution to conferencePaperResearchpeer-review


We apply Natural Language Processing and supervised Machine Learning to predict stock price movements, based on approximately 30,000 ad-hoc disclosures issued by publicly traded companies on Nasdaq OMX Stockholm. Three different labeling methods, based on financial theory, are defined and assessed. The best results, using Logistic Regression and TF-IDF with character-grams, achieve an increase of 6,3 percentage points above a majority class baseline. These results show that corporate ad-hoc disclosures, which are regulated to represent novel and value-relevant information, are particularly well-suited for this task. Furthermore, the most sophisticated labeling technique used, Jensen’s Alpha in the context of the Capital Asset Pricing Model, helps the model achieve its highest accuracy. The results therefore show that financial theory can help isolate the effect of an informational event on stock prices, improving the supervised Machine Learning approach. Finally, an algorithmic trading strategy is simulated with the best model, yielding positive abnormal returns.
Original languageEnglish
Publication date2020
Number of pages8
Publication statusPublished - 2020
EventThe 4th IEEE International Workshop on Big Data for Financial News and Data: Co-located with 2020 IEEE International Conference on Big Data (IEEE BigData 2020) - Online
Duration: 10 Dec 202013 Dec 2020
Conference number: 4


WorkshopThe 4th IEEE International Workshop on Big Data for Financial News and Data
Internet address

Bibliographical note

CBS Library does not have access to the material


  • Algorithmic trading
  • Machine learning
  • Stock price prediction
  • Ad-hoc disclosures
  • Natural language processing
  • Text mining
  • Finance

Cite this