# Elforbruget i Danmark: Med en statistisk tilgang

Julie Topp Hansen & Iben Linnea Christensen

Student thesis: Master thesis

## Abstract

This paper concerns the electricity consumption in Denmark. We have collected a dataset from Energinet which consists of electricity information for the past 5 years, adding up to over a million observations. Furthermore, we have added other suitable variables including weather and prices. The dataset consists of two price areas, DK1 and DK2, which we by a statistical test found had a significant difierence in consumption. Therefore, the dataset has been divided in the two price areas in the main part of the analysis in this assignment. The subjects we want to examine are if it's possible to achieve a good overview of the massive and complex dataset, and if we by all the included variables, can create an acceptable statistical model that describes the consumption. To investigate the questions, we are going to use various theories and methods. Visual analytics for time series data has certain demands for their charts, since we must consider that there is a time variable. This limits some of the most common visual analytics methods, but other than that we can use some charts which are specifically good at managing time. In the process of estimating a statistical model, which can describe the consumption, we are using some of the most common statistical approaches. We set up relevant hypotheses and test them with t-test, check the variables for variance in ation and model interactions. By Box-Cox we found that it was necessary to transform the response variable. At last, we end up with to models with the following adjusted R-squared: 0.7498 for DK1 and 0.8442 for DK2. We continue working with the two separate datasets in the time series analysis. By mostly looking at autocorre-lation and partial autocorrelation plots, we determine which autoregressive and moving average components the ARIMA model need in order to be a stationary process. We review the assumptions of stationarity by Dickey-Fuller tests, and with satisfying low p-values, we define our ARIMA models. The best model for DK1 is an ARIMA(2,0,0)(0,1,1)12 and for DK2 is ARIMA(0,0,1)(0,1,1)12. A forecast for both models gives us two plots where the forecasted values follows the same path as the consumption the previous years. By having this approach, we were able to look at the subject from difierent perspectives. Visual analytics helped us understand the characteristics of the variables, statistical modelling assisted us in describing the electricity consumption and time series analysis made us able to forecast the future amount of consumption. Altogether, this has given us a comprehensive detailed insight into electricity consumption in Denmark.