Statistical Analysis of a Data Centre Resource Usage Patterns: A Case Study

Somnath Mazumdar*, Anoop S. Kumar

Performance evaluation is necessary to understand the runtime behaviour of a computing system. A better understanding of resource usage leads to better utilisation and less energy cost. To optimise the server provisioning and also the energy cost of a data centre (DC), we should explore the underlying resource usage patterns to extract meaningful information. In this paper, our primary goal is to obtain correlation or cross-correlation among CPU, RAM, and Network at different timescales of a DC. To perform this analysis, we have collected Wikimedia grid traces and conducted an experimental campaign using rationally selected multiple statistical methods. They are: a univariate method (Hurst exponent), multivariate explanatory methods (such as wavelets, cross-recurrence quantification analysis (CRQA)), multivariate predictive methods (such as vector auto-regression (VAR), multivariate adaptive regression splines (MARS)). It is worth to note that, we analyse the data without any prior knowledge about running applications. We present the results together with a comprehensive analysis. In our case study, we found in long time scale CPU, and RAM is more correlated than Network. We also have shown that wavelet-based methods are superior to detect long-run relationship among these resource variables.
