Code summaries are essential for helping developers understand code functionality and reducing maintenance and collaboration costs. Although recent advances in large language models (LLMs) have significantly improved automatic code summarization, the practical usefulness of generated summaries in industrial settings remains insufficiently explored. In collaboration with documentation experts from the industrial HarmonyOS project, we conducted a questionnaire study showing that over 57.4% of code summaries produced by state-of-the-art approaches were rejected due to violations of developers' expectations for industrial documentation. Beyond semantic similarity to reference summaries, developers emphasize additional requirements, including the use of appropriate domain terminology, explicit function categorization, and the avoidance of redundant implementation details. To address these expectations, we propose ExpSum, an expectation-aware code summarization approach that integrates function metadata abstraction, informative metadata filtering, context-aware domain knowledge retrieval, and constraint-driven prompting to guide LLMs in generating structured, expectation-aligned summaries. We evaluate ExpSum on the HarmonyOS project and widely used code summarization benchmarks. Experimental results show that ExpSum consistently outperforms all baselines, achieving improvements of up to 26.71% in BLEU-4 and 20.10% in ROUGE-L on HarmonyOS. Furthermore, LLM-based evaluations indicate that ExpSum-generated summaries better align with developer expectations across other projects, demonstrating its effectiveness for industrial code documentation.
翻译:代码摘要对于帮助开发者理解代码功能、降低维护与协作成本至关重要。尽管大型语言模型(LLMs)的最新进展显著提升了自动代码摘要的生成能力,但生成的摘要在工业环境中的实际实用性仍未得到充分探索。通过与工业级HarmonyOS项目的文档专家合作,我们开展了一项问卷调查,结果表明超过57.4%的现有先进方法生成的代码摘要因违反开发者对工业文档的期望而被拒绝。除了与参考摘要的语义相似性外,开发者还强调额外需求,包括使用恰当的领域术语、明确的功能分类以及避免冗余的实现细节。为满足这些期望,我们提出了ExpSum——一种期望感知的代码摘要生成方法,该方法集成了函数元数据抽象、信息性元数据过滤、上下文感知的领域知识检索以及约束驱动的提示技术,以引导LLMs生成结构化且符合期望的摘要。我们在HarmonyOS项目及广泛使用的代码摘要基准上评估了ExpSum。实验结果表明,ExpSum在所有基线方法中均表现优异,在HarmonyOS上BLEU-4指标提升高达26.71%,ROUGE-L指标提升达20.10%。此外,基于LLM的评估显示,ExpSum生成的摘要能更好地契合其他项目中开发者的期望,证明了其在工业代码文档生成中的有效性。