Classifying Political Speeches Using Machine Learning

Sara Lee Naldal

Student thesis: Master thesis


This thesis investigates the political affiliation of statements given by members of the Danish Parliament. It employs methods from computational linguistics which combines computer science methods of machine learning with linguistic knowledge to perform natural language processing. I specifically work within the framework of sentiment analysisandtheirconcept of document sentiment analysis for the investigation.
The thesis attempts to automatically classify the statements according to their political affiliation using machine learning. The empirical material of the thesis consists of speeches from the Danish Parliament over an 11 year time period. I employ a supportvectormachine and a neural network for the task and test them on the full dataset as well as on temporal slices of the data depending on its year of origin. The support vector machine represent a typical machine learning method with proven good results in document classification tasks. The neural network was implemented to test a deeper learning method on the data set and compare its performance with a support vector machine.
In my testing I perform binary and multiclass classification and compare the overall performance of the two methods both on the full dataset as well as it partitions and discuss the shortcomings and strengths of the two methods compared to each other. Both methods achieve fairly goodresultscomparedtocurrentresearchonboththebinaryandthemulticlass task, however, with a much better performance on the binary learning task. Following this I investigate the generalizability of both methods by testing them on a dataset consisting of tweets and status updates from Twitter and Facebook.Following this I discuss the results and the level of domain dependence shown by both methods.
Evaluating the project as a whole there are definite optionsforfuturedevelopmentincluding a deeper level of preprocessing the data prior to training and taking a more explorative approach to investigate the language use of the different parties both as a whole and temporally.
Finally I discuss possible broader use cases for automatic detection of political affiliation focusing especially on online texts including microtargeting and bias detection.

EducationsMSc in Business Administration and Information Systems, (Graduate Programme) Final Thesis
Publication date2018
Number of pages82
SupervisorsDaniel Hardt