跳到主要导航 跳到搜索 跳到主要内容

Code LLMs Still Fall Short of Top Programmers: Evaluating Algorithmic Code Generation Through Computational Thinking

  • Shisong Chen
  • , Ziyu Zhou
  • , Yicong Zhao
  • , Chengyi Yang
  • , Zhixu Li*
  • , Yanghua Xiao
  • , Xin Lin
  • , Xiaojun Meng
  • , Jiansheng Wei
  • , Kuien Liu
  • *此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Evaluating the coding capabilities of models through algorithmic code generation is challenging, as it requires deep problem understanding and complex algorithm design. Current benchmarks suffer from a narrow focus on final execution results (such as pass@k), neglecting the crucial reasoning and problem-solving processes inherent in code generation. To address this limitation, we introduce a multi-phase algorithmic code generation benchmark, MUPA, structured around human computational thinking. MUPA dissects the evaluation into four distinct phases: example understanding, algorithm selection, solution description, and code generation. This framework facilitates a comprehensive assessment by providing insights into the model's intermediate problem-solving steps, rather than just the final code. We manually curated 197 high-quality competitive programming problems from Codeforces. Utilizing an LLM-as-a-judge paradigm with specialized prompts, our rigorous evaluation of several existing code generation LLMs reveals significant across-the-board challenges. Notably, we establish a positive correlation, indicating that proficiency in an earlier phase directly impacts performance in subsequent phases, underscoring the interdependency of these algorithmic skills. The benchmark is publicly available at https://github.com/cheniison/MUPA.

源语言英语
主期刊名WSDM 2026 - Proceedings of the 19th ACM International Conference on Web Search and Data Mining
出版商Association for Computing Machinery, Inc
79-88
页数10
ISBN(电子版)9798400722929
DOI
出版状态已出版 - 21 2月 2026
活动19th ACM International Conference on Web Search and Data Mining, WSDM 2026 - Boise, 美国
期限: 22 2月 202626 2月 2026

出版系列

姓名WSDM 2026 - Proceedings of the 19th ACM International Conference on Web Search and Data Mining

会议

会议19th ACM International Conference on Web Search and Data Mining, WSDM 2026
国家/地区美国
Boise
时期22/02/2626/02/26

指纹

探究 'Code LLMs Still Fall Short of Top Programmers: Evaluating Algorithmic Code Generation Through Computational Thinking' 的科研主题。它们共同构成独一无二的指纹。

引用此