This study sets out to examine to what extent models, based on automated textual analysis of the content of 10 -K and 10 -Q filings, can be used to enhance the accuracy of analysts’ earnings per share forecasts. The past decades have witnessed significant increases in computational power and an explosion of digitally available text. Researchers within the fields of accounting and finance have exploited these developments in attempts to predict events, such as bankruptcies and fraud, by using various textual sources, such as corporate disclosures and online news. However, the research of analysts’ forecasts seems to have focused more on behavioral aspects, e.g. analyst biases and the abilities of analysts to incorporate information, than on testing whether the significant increase in available information and the development in tools for utilizing such information, can be applied to enhance the forecast accuracy of analysts. By applying DataRobot, a state of the art machine learning platform, and the textual content of 10 -K and 10 -Q filings, directional models for each of the six included sector-specific subsamples are constructed. These models are set to predict whether consensus will over- or underestimate EPS in the following quarter, and they are built on data containing the textual content and submission dates of such filings of S&P 50 0 companies from years 20 12-20 17, as well as corresponding historic earnings surprise data. It is found that analysts, who are not data scientists, are able to significantly enhance the accuracy of their earnings per share forecasts by implementing automated textual analysis as suggested in this study. The findings indicate that 10 -K and 10 -Q filings contain information that analysts fail to fully incorporate in their forecasts, and the suggested tool is able to identify patterns in regards to such forecasting difficulties. Thus, besides providing analysts with predictions in regards to the directional shifts, they should make, in order to enhance their forecast accuracy, the output of the models can be beneficial to analysts by possibly identifying information that they seem to fail to comprehend. These findings are concluded to be robust to differences in market capitalization, analyst coverage, document length and the number of quarters used for training data in the modeling. Lastly, through conducting an extensive literature review and through analyses of model performance, the findings illustrate that no one type of model is consistently superior across or within different predictive tasks.
|Educations||MSc in Finance and Accounting, (Graduate Programme) Final Thesis|
|Number of pages||171|
|Supervisors||Thomas Plenborg & Thomas Riise Johansen|