Abstract
The number of large language models for code generation is rising. However, comprehensive evaluations that focus on reliability and security remain sparse. This study evaluated the Python language code quality generated by five large language models. They are GPT-4-Turbo, DeepSeek-Coder-33B-Instruct, Gemini Pro 1.0, Codex and CodeLLama70 b -Instruct. The evaluation considered three diverse application domains with varying prompt lengths for fair comparison. We found GPT-4-Turbo generated (on average) 4.5% more secure code than a Python code developer with three years of experience.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings - 33rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2025 |
Udgivelsessted | Los Alamitos, CA |
Forlag | Institute of Electrical and Electronics Engineers Inc. |
Publikationsdato | mar. 2025 |
Sider | 13-16 |
ISBN (Trykt) | 9798331524944 |
ISBN (Elektronisk) | 9798331524937 |
DOI | |
Status | Udgivet - mar. 2025 |
Begivenhed | 33rd Euromicro International Conference on Parallel, Distributed and Network-based Processing. PDP 2025 - University of Turin, Torino, Italien Varighed: 12 mar. 2025 → 14 mar. 2025 Konferencens nummer: 33 https://pdp2025.org/ |
Konference
Konference | 33rd Euromicro International Conference on Parallel, Distributed and Network-based Processing. PDP 2025 |
---|---|
Nummer | 33 |
Lokation | University of Turin |
Land/Område | Italien |
By | Torino |
Periode | 12/03/2025 → 14/03/2025 |
Internetadresse |
Emneord
- Code
- LLM
- Python
- Reliability
- Safety
- Security