Code complexity metrics such as cyclomatic complexity have long been used to assess software quality and maintainability. With the rapid advancement of large language models (LLMs) on coding tasks, an important yet underexplored question arises: do traditional complexity metrics meaningfully characterize the coding difficulty that LLMs perceive? In this work, we empirically demonstrate that classical complexity metrics exhibit no consistent correlation with LLM performance, revealing a fundamental mismatch with model-perceived difficulty. To address this gap, we propose LM-CC, a novel code complexity metric tailored for LLMs, grounded in the hypothesis that model-perceived code difficulty is fundamentally driven by semantic nonlinearity. LM-CC quantifies complexity through an entropy-guided semantic compositional hierarchy, capturing the cumulative uncertainty encountered by LLMs during code understanding. Our experimental results demonstrate that LM-CC exhibits strong and consistent partial correlations with LLM performance, while semantics-preserving reductions in LM-CC consistently lead to improved downstream task performance. The source code is available at: https://github.com/xchen121/lm-cc.
翻译:诸如圈复杂度等代码复杂度指标长期被用于评估软件质量与可维护性。随着大语言模型在编码任务上的快速进展,一个重要但尚未充分探索的问题应运而生:传统复杂度指标能否有意义地表征LLM在编码任务中感知到的难度?在本工作中,我们通过实证研究发现,经典复杂度指标与LLM性能之间不存在一致的相关性,揭示了其与模型感知难度之间的根本性不匹配。为填补这一空白,我们提出LM-CC——一种专为LLM量身定制的全新代码复杂度指标,其核心假设是模型感知的代码难度根本上源于语义非线性。LM-CC通过熵引导的语义组合层次结构来量化复杂度,捕捉LLM在代码理解过程中遇到的累积不确定性。实验结果表明,LM-CC与LLM性能呈现强烈且一致的偏相关性,同时,在保持语义不变的情况下降低LM-CC值可一致地提升下游任务性能。源代码地址:https://github.com/xchen121/lm-cc。