Despite their remarkable performance, the development of Large Language Models (LLMs) faces a critical challenge in scalable oversight: providing effective feedback for tasks where human evaluation is difficult or where LLMs outperform humans. While there is growing interest in using LLMs for critique, current approaches still rely on human annotations or more powerful models, leaving the issue of enhancing critique capabilities without external supervision unresolved. We introduce SCRIT (Self-evolving CRITic), a framework that enables genuine self-evolution of critique abilities. Technically, SCRIT self-improves by training on synthetic data, generated by a contrastive-based self-critic that uses reference solutions for step-by-step critique, and a self-validation mechanism that ensures critique quality through correction outcomes. Implemented with Qwen2.5-72B-Instruct, one of the most powerful LLMs, SCRIT achieves up to a 10.3\% improvement on critique-correction and error identification benchmarks. Our analysis reveals that SCRIT's performance scales positively with data and model size, outperforms alternative approaches, and benefits critically from its self-validation component.
翻译:尽管大型语言模型(LLM)表现出卓越的性能,但其发展面临可扩展监督的关键挑战:在人类评估困难或LLM超越人类表现的任务中提供有效反馈。虽然利用LLM进行批评的方法日益受到关注,但现有方法仍依赖人工标注或更强大的模型,尚未解决无需外部监督提升批评能力的问题。我们提出SCRIT(自进化批评框架),该框架实现了批评能力的真正自进化。技术上,SCRIT通过训练合成数据实现自我改进:这些数据由基于对比的自批评器生成(该批评器利用参考解决方案进行逐步批评),并通过自验证机制(依据修正结果确保批评质量)进行质量控制。基于当前最强大的LLM之一Qwen2.5-72B-Instruct实现的SCRIT,在批评修正和错误识别基准测试中取得了最高10.3%的性能提升。分析表明,SCRIT的性能随数据和模型规模呈正向扩展,优于替代方法,且其自验证组件对性能提升具有关键作用。