Data Quality in Analytics Area

Zoltán Endrodi

Student thesis: Master thesis

Abstract

This thesis examines what factors are affecting data quality related to Business Intellgence systems. This question gets a bigger emphasis in the years of datafication and Big Data, because the volume and the variety of data sources are growing, but most companies are struggling with their own internal data as well to make it somehow valid and reliable for reporting tools. During the paper I made a literature review in which I tried to introduce all the relating terms and concepts that are correlating with data quality and the business analytics market. I also mentioned the latest market trends connected to the BI platforms and the data itself, over and above I mentioned Big Data and Cloud as buzzwords briefly. The creating of the literature review was conducted by database driven method, and after I located the potentially relevant articles, I collect them in an excel table and categories them to make the process more effective. In order to answer the stated research question, I gathered primary data by making one unstructured and four semi-structured interview. The unstructured one was made with a data miner and big data expert, who gave me an overall understanding about how a real life Big Data project looks like in practice. The semi-structured interviews were conducted with IT managers guided by previously collected questions with letting enough freedom the interviewees to keep up the free flow of the conversation. After the data collection process I decided to create an affinity diagram (KJ method) in order to analyze the data outcome. This method is a more ladder categorization process. I was listening to the recorded interviews and write down all the thoughts and ideas, afterwards I created groups from these notes and give names to them. As a result of the analysis I created the following four groups: Business use, Trust, Data validity and Data sources. From the data pool that I generated from the interviews, I created these four factors that have relevant impact on data quality related to the BI area. During the paper I mentioned future opportunities that BI can represent in the short or long term future e.g.: improved predictive analytics, multimedia based Business Intelligence, embedded Intelligence or voice-based BI. 3 Finally, I would say that after reading this paper one can get an overall understanding why I presume these factors are affecting data quality, and as a conclusion I add thoughts about what I would recommend as further research directions that could base on this paper.

EducationsMSc in Business Administration and Information Systems, (Graduate Programme) Final Thesis
LanguageEnglish
Publication date2016
Number of pages43
SupervisorsStefan Henningsson