Towards a Standardized Corporate Sustainability Index: Leveraging the SDGs and Deep Learning to Evaluate Companies' Sustainability Efforts

Pascal Essig & Martin Thoma

Student thesis: Master thesis


Due to the economic and non-economic risks that climate change poses, the awareness and interest for sustainability efforts by companies increased strongly among investors and other key stakeholders in the last decades. Greenwashing and the lack of structured data make it a challenging task to evaluate companies’ sustainability efforts. Therefore, yet no standardized evaluation measure is predominantly established. This thesis answers the question “How to measure corporate sustainability using Natural Language Processing (NLP) methods on textual data?”. Since the 17 Sustainable Development Goals (SDGs) are widely known and cover all relevant aspects of sustainability, they provide a suitable framework to improve current rating standards. Classification algorithms leverage hand-labeled SDG and company-specific data to evaluate sustainability-related efforts detected in job postings. A rule-based model relying on a SDG keyword dictionary is challenged by a deep learning model on predicting SDG labels on the job posting data. Both models reach 0.80 accuracy on the test data; and F1 macro scores of 0.39 for the baseline model and 0.46 for the deep learning model. Both models are solid solutions to label sentences with the SDGs in a multi-label classification setting. The intuitive rule-based model can be developed quickly and is identified as well suitable for proof of concept solutions. The deep learning model, incorporating contextual information, provides a more sophisticated solution that has the potential to improve beyond the achieved scores. This work lays the foundation to create an overall SDG-based sustainability index for companies. Furthermore, this work lowers the efforts to transfer the classification models from the job posting domain to other data sources. The balance and information value of the SDG index grows the more data sources are incorporated. Lastly, a theoretical proposal is presented on how key stakeholders can benefit from such a SDG index in the future.

EducationsMSc in Computer Science, (Graduate Programme) Final Thesis
Publication date2021
Number of pages136
SupervisorsRaghava Rao Mukkamala