Abstract
The number of large language models for code generation is rising. However, comprehensive evaluations that focus on reliability and security remain sparse. This study evaluated the Python language code quality generated by five large language models. They are GPT-4-Turbo, DeepSeek-Coder-33B-Instruct, Gemini Pro 1.0, Codex and CodeLLama70 b -Instruct. The evaluation considered three diverse application domains with varying prompt lengths for fair comparison. We found GPT-4-Turbo generated (on average) 4.5% more secure code than a Python code developer with three years of experience.
Original language | English |
---|---|
Title of host publication | Proceedings - 33rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2025 |
Place of Publication | Los Alamitos, CA |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Publication date | Mar 2025 |
Pages | 13-16 |
ISBN (Print) | 9798331524944 |
ISBN (Electronic) | 9798331524937 |
DOIs | |
Publication status | Published - Mar 2025 |
Event | 33rd Euromicro International Conference on Parallel, Distributed and Network-based Processing. PDP 2025 - University of Turin, Torino, Italy Duration: 12 Mar 2025 → 14 Mar 2025 Conference number: 33 https://pdp2025.org/ |
Conference
Conference | 33rd Euromicro International Conference on Parallel, Distributed and Network-based Processing. PDP 2025 |
---|---|
Number | 33 |
Location | University of Turin |
Country/Territory | Italy |
City | Torino |
Period | 12/03/2025 → 14/03/2025 |
Internet address |
Keywords
- Code
- LLM
- Python
- Reliability
- Safety
- Security