Viability of sentiment analysis in business: Evaluating accuracy and the supporting NLP technologies

Emil Blædel Nygaard & Peter Halling Hilborg

Student thesis: Master thesis


Natural language processing (NLP) has in later years seen more and more implementations in businesses, largely because of sentiment analysis and its applications to social data, such as social media monitoring. In this paper we therefore explore the accuracy and applications, of sentiment analysis and consider it against the related technologies on the market. To this end a number of interviews and data comparisons were conducted. A number of sentiment analysis tools were chosen for the test, specifically The Stanford Classifier, Sentistrength, Semantria, Radian6 and “The Feeling Meter” from Copenhagen Business School. A test set of approximately 18500 Facebook posts were extracted via Radian6 and run against all of the tools, in order to be able to compare their performance. Furthermore, a representative sample of 1000 posts were extracted and manually annotated by four individual human coders. In further extension of the interviews, through discussion and speculations, a number of applications are suggested as opportunity for development or value generation. Due to the paper not focusing on a specific industry or application, it has not been possible to categorize value in specific terms. The results showed that the commercial tools Sentistrength and Semantria were by far the most accurate, when compared to the human annotation. It was found that the tools are highly reliant on the data they have been trained on, as well as the training data plays a large role in domain adaptation. On the individual post level, all the tools performed extremely poorly, but all did see a significantly more accurate result (based on the human annotation) of considering the aggregate score. We conclude that the performance of the tools, might not be due to the technology, but rather the approach to sentiment that has been chosen in the industry. Labeling with 3-5 labels of sentiment only provides very minor value to the data, in order for it to have an valuable business application. Instead, we suggest an approach where sentiment is represented by a more dynamic numeric value, derived via annotators using a dynamic slider rather than a negative/positive label. This will create a different measurement for accuracy, on that might be more in tune with the actual annotation. Through our interviews with a number of academics as well as Trustpilot and Falcon Social, we also speculate and discuss the extended value of using sentiment analysis as an extension of other technologies or for other applications. The conclusions, due to the low accuracy derived from our tests and through points raised in the interviews, is that the current form of sentiment does not utilize the potential of the data. We therefore suggest a number of different extended applications, whilst the silver lining is the conclusion that sentiment analysis in its current form is uninformative, and needs to be augmented either with external data parameters, such as extralingustic data, profile data or topic mining, in order to add more value. While extra dimension added in the Feeling Meter (arousal) adds a notable degree of extended information, the manner of which people express themselves on social media, means that insights beyond just sentiment, needs to be mined in order to gain appropriate and useful insights into the dialogue.

EducationsMSc in Business Administration and Information Systems, (Graduate Programme) Final Thesis
Publication date2015
Number of pages108