This study mines circa 30 000 textual corporate ad-hoc disclosures issued during the last ten years by publicly traded companies on Nasdaq OMX Stockholm. Natural Language Processing methods are used together with supervised Machine Learning modeling to predict stock price movements after the disclosures are published. The study follows an event study structure and evaluates three labeling methods based in financial theory. Jensen’s Alpha in the context of the Capital Asset Pricing Model, which is the most sophisticated model used, best assists the supervised labeling. This indicates that financial models can help isolate the effect of an informational event on stock prices. The results show that the best text data pre-processing method is TF-IDF using character grams that together with the best classifier Logistic Regression form the best Machine Learning model. The model produces a leverage of 6,3 percentage points above the ZeroR baseline. Finally, an algorithmic trading strategy is simulated using the model to evaluate whether it can create significant positive abnormal returns on the stock market. Several of the simulated trading strategies produce positive abnormal returns but none of them are statistically significant on a 0,05 level. Many improvement areas are identified for the machine learning model and the algorithmic trading strategy with potential to improve performance with relevance for future research on stock price prediction with textual data.
|Uddannelser||Cand.merc.it Business Administration and Information Systems, (Kandidatuddannelse) Afsluttende afhandling|