CRScore：基于代码声明与代码异味实现代码审查评论的自动化评估 (CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells)

The task of automated code review has recently gained a lot of attention from the machine learning community. However, current review comment evaluation metrics rely on comparisons with a human-written reference for a given code change (also called a diff), even though code review is a one-to-many problem like generation and summarization with many "valid reviews" for a diff. To tackle these issues we develop a CRScore - a reference-free metric to measure dimensions of review quality like conciseness, comprehensiveness, and relevance. We design CRScore to evaluate reviews in a way that is grounded in claims and potential issues detected in the code by LLMs and static analyzers. We demonstrate that CRScore can produce valid, fine-grained scores of review quality that have the greatest alignment with human judgment (0.54 Spearman correlation) and are more sensitive than reference-based metrics. We also release a corpus of 2.6k human-annotated review quality scores for machine-generated and GitHub review comments to support the development of automated metrics.

翻译：自动化代码审查任务近期在机器学习领域获得了广泛关注。然而，当前的审查评论评估指标依赖于与给定代码变更（亦称差异代码）的人工撰写参考进行对比，尽管代码审查如同生成与摘要任务一样属于一对多问题——同一份差异代码可能存在多种“有效审查”。为解决这些问题，我们开发了CRScore——一种无需参考的指标，用于衡量审查质量的多个维度，如简洁性、全面性与相关性。我们设计CRScore的评估方式基于大语言模型和静态分析器在代码中检测到的声明与潜在问题。实验证明，CRScore能够生成有效且细粒度的审查质量评分，其与人类判断具有最高的一致性（斯皮尔曼相关系数达0.54），且比基于参考的指标更具敏感性。我们还发布了包含2.6k条人工标注的机器生成与GitHub审查评论质量评分的数据集，以支持自动化指标的后续开发。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日