The Socioeconomic Development of Zimbabwe: Combining Promising Wealth and Poverty Estimates to Understand and Predict Poverty

Daniel Christopher Torpe Blatch

Student thesis: Master thesis


The aim of this thesis is to understand and predict the socioeconomic development of a developing country using machine learning, big data, and promising wealth and poverty estimates. The research design encompasses a single and holistic case study method, using an explorative design to effectively address the research problem, with Zimbabwe as the chosen case country as well as the unit of analysis. I collect data on household surveys and geospatial covariates from the Demographic and Health Surveys (DHS). Building on prior research, the feature selection process is based on combining the following three promising wealth and poverty estimates: call detail records, internet activity, and satellite imagery. I perform an exploratory data analysis and an exploratory spatial data analysis to understand the socioeconomic development. I fit a Voting Regression model, comprised of a Linear-, Support Vector-, ElasticNet-, and Random Forest regression model, that describes the relationship between the five features (mobile ownership, internet usage last month, transaction by mobile, family planning from mobile, nightlight) to the wealth index (WI) derived from the DHS. The major findings include a very strong correlation and positive linear relationship between the call detail records- and internet activity features, that wealth is highly concentrated in the Central regions of Harare and Bulawayo, that in the last year, people have on average gone from never using the internet, to using it, and that most of the country is without nightlight. Furthermore, a trend indicates that the North and West regions of Zimbabwe experience the lowest levels of wealth. The final machine learning model accurately estimates the WI of the 400 clusters in Zimbabwe (cross-validatedcorrelation coefficient of 0.93 and coefficient of determination of 0.88). The thesis shows that a manufactured anonymous call detail records feature can be used to quantify poverty, reaffirming the usefulness of call detail records in data analysis, and necessitating open data policies from the mobile network operators to allow cost- and time effective poverty alleviation efforts. I establish the potential of combining call detail records, internet activity and satellite imagery for informing policymakers and monitoring poverty in real-time by geographical district. Finally, the thesis provides the basis of an approach for government donors to appropriately allocate resources and monitor the impact of different interventions.

EducationsMSc in Computer Science, (Graduate Programme) Final Thesis
Publication date2021
Number of pages81
SupervisorsSomnath Mazumdar