Evaluating Creative Output With Generative Artificial Intelligence: Comparing GPT Models and Human Experts in Idea Evaluation

Research output: Contribution to journalJournal articleResearchpeer-review

22 Downloads (Pure)

Abstract

Traditional techniques for evaluating creative outcomes are typically based on evaluations made by human experts. These methods suffer from challenges such as subjectivity, biases, limited availability, ‘crowding’, and high transaction costs. We propose that large language models (LLMs) can be used to overcome these shortcomings. However, there is a dearth of research comparing the performance of LLMs to traditional expert evaluations for evaluating creative outcomes such as ideas. Our study compares the alignment of expert evaluations with evaluations from the LLM GPT-4. Our results reveal that to achieve moderate evaluation alignment with experts, LLMs require using a base framework and a spectrum-based few-shot prompt. We offer six theoretical contributions, shifting the focus from whether LLMs can evaluate to how specific design choices shape their alignment with human judgement. These insights are situated within broader frameworks from cognitive science, creativity theory, and machine learning. Furthermore, we outline six propositions for organizations interested in LLM-supported evaluation methods. Key recommendations include utilizing base frameworks for large-scale idea screening, establishing a database of evaluated ideas to optimize few-shot performance, and leveraging AI–human collaboration for internal and external idea sourcing. Additionally, we highlight the need for privacy considerations when using third-party LLMs for proprietary idea evaluations. This research contributes to innovation management literature by exploring methods for integrating LLM into creative evaluation processes to enhance scalability and efficiency while retaining evaluation quality.
Original languageEnglish
JournalCreativity and Innovation Management
Volume34
Issue number4
Pages (from-to)991-1012
Number of pages22
ISSN0963-1690
DOIs
Publication statusPublished - Dec 2025

Bibliographical note

Published online: 11 August 2025.

Keywords

  • Assessment technique
  • Creativity management
  • Deep learning
  • Evaluation technique
  • External sourcing
  • Idea screening
  • Large language models
  • Open innovation
  • Outside-in innovation

Cite this