Abstract
The application of large language models (LLMs) for code generation, like GitHub Copilot, has surged, yet comprehensive evaluations under realistic conditions remain sparse. This study attends to this by employing a rigorous methodological framework to evaluate the quality of code generated by nine state-of-the-art LLMs under realistic software development scenarios. The research leverages various domains and levels of prompt detail, reflecting the different contexts in which this technology will be em- ployed. At the centre of the evaluation is ISO/IEC 5055:2021 (ISO 5055), a recognized code quality standard that assesses code quality across the principal categories of main- tainability, reliability, performance efficiency, and security. By centring the evaluation around ISO 5055, the research aims to quantify the adherence of LLM outputs to a rec- ognized code quality standard, thus providing an objective basis for comparing different LLMs. Popular static code analysis tools are employed to perform a comprehensive evaluation that generates quantitative metrics for the analysis. This process revealed that LLMs demonstrate mixed results across the ISO 5055 categories, often meeting or exceeding baseline performance when LLMs were appropriately prompted and config- ured. Among the LLMs, GPT-4-Turbo and Gemini performed slightly superior. This analysis provided a detailed, objective assessment of code quality, which paved the way for creating the comprehensive development framework Programmatic Excellence via LLM Iteration (PELLI). At the heart of PELLI is an iterative and analysis-centred process focusing on upholding high-quality code changes. Following this framework, practitioners can ensure harmonious integration between LLMs and human developers, ensuring their potential is fully realised.
The research contributes to the ongoing development of generative LLMs for code synthesis. It serves as a practical guide for developers aiming to leverage LLMs while adhering to recognized quality standards. This study’s outcomes are crucial for advanc- ing LLM technologies in real-world applications, providing stakeholders with a clear understanding of where these LLMs excel and where they require further refinement.
| Educations | MSc in Business Administration and Data Science, (Graduate Programme) Final Thesis |
|---|---|
| Language | English |
| Publication date | 2024 |
| Number of pages | 81 |
| Supervisors | Somnath Mazumdar |