What Are They Talking About? Analyzing the Predictive Power of Earnings Call Transcripts through Topic Modeling

Jozef Polák & Martin Slezák

Student thesis: Master thesis


The field of machine learning has been rapidly evolving. Relatively recently, a specific branch of machine learning with a focus on natural language processing allowed for further exploration of textual data. With topic modeling, one can cover a large body of text and determine the topic composition to draw specific insights about the content of a document. With this in mind, the main objective of this academic work is to find out whether the inclusion of topics derived from earnings conference calls can provide additional predictive power for models forecasting revenues and profitability changes. Two natural language processing techniques, LDA and NMF, are applied to derive topics potentially characterizing certain themes from the conference call transcripts. In conjunction with classification models, these topics are provided as additional independent variables in order to see whether they can boost the accuracy of frameworks predicting the upcoming changes in revenues and net income to sales ratio.
Findings in this paper indicate that the inclusion of topics drawn from these conference calls leads to an improvement in prediction accuracy of the classification based models in most of the cases. When comparing the two topic modeling techniques, neither of them is able to provide superior results to the other, leading to similar improvements. This research also considers various scenarios which compare the complete transcript with its sub-sections, the management presentation and questions-and-answers part. Here, no distinctive differences are detected, rather showing that all of them can improve the tested models to a similar extent. Across the classification models, the Ran-dom Forest yields by far the greatest accuracies for the prediction of both revenue and profit margin changes. Both Logistic Regression and Decision Tree models are consistently improved by the inclusion of topics, while the XGBoost is providing rather mixed results, usually with only moderate or no enhancement. Overall, the analysis proves that the utilization of topic modeling on conference calls in the Nordic setting can provide additional useful information when combined together with other financial figures to forecast the changes in revenues and profitability.

EducationsMSc in Finance and Investments, (Graduate Programme) Final Thesis
Publication date2020
Number of pages91