Predicting Stock Performance Using 10-K Filings: A Natural Language Processing Approach Employing Convolutional Neural Networks

Kasper Regenburg J√łnsson & Jonas Burup Jakobsen

Student thesis: Master thesis


This paper aims to predict company-specific performance based on the textual elements of 10-K filings. Due to their limited information processing capacity, investors need time to incorporate the information contained within the textual content of the 10-K filings into the market. This delay generates an opportunity for the investors to earn abnormal returns using automated text analysis. Using word embeddings to represent the text as input to a convolutional neural network (CNN), we analyze the text of over 29,000 10-K filings from 2010 to 2017. We find that company-specific stock performance is predictable. Furthermore, we control the results for known risk factors using the Fama-French 5 factor model finding that the investors are able to generate significant risk-adjusted returns based on the classifications of the CNN. Based on the findings, we propose several implications. Firstly, we confirm that the textual elements of the 10-K filings contain information which the investors currently do not fully utilize. Secondly, we contribute to the validity of using deep learning models when predicting company-specific performance. Lastly, we provide a practical tool for the investors, the regulatory entities, and the respective company to analyze the textual elements of the 10-K filings.

EducationsMSc in Finance and Investments, (Graduate Programme) Final Thesis
Publication date2018
Number of pages97
SupervisorsThomas Plenborg & Thomas Riise Johansen