Large-scale Chinese spelling correction (CSC) remains critical for real-world text processing, yet existing LLMs and supervised methods lack robustness to novel errors and rely on costly annotations. We introduce CEC-Zero, a zero-supervision reinforcement learning framework that addresses this by enabling LLMs to correct their own mistakes. CEC-Zero synthesizes errorful inputs from clean text, computes cluster-consensus rewards via semantic similarity and candidate agreement, and optimizes the policy with PPO. It outperforms supervised baselines by 10--13 F$_1$ points and strong LLM fine-tunes by 5--8 points across 9 benchmarks, with theoretical guarantees of unbiased rewards and convergence. CEC-Zero establishes a label-free paradigm for robust, scalable CSC, unlocking LLM potential in noisy text pipelines.
翻译:大规模中文拼写纠错在实际文本处理中至关重要,然而现有的大语言模型与监督方法对新型错误缺乏鲁棒性,且依赖昂贵的标注数据。本文提出CEC-Zero,一种无监督强化学习框架,通过使大语言模型能够自我纠正错误来解决这一问题。CEC-Zero从干净文本合成含错误的输入,通过语义相似度与候选一致性计算集群共识奖励,并利用PPO优化策略。在9个基准测试中,其性能超越监督基线10-13个F$_1$分数点,优于强力大语言模型微调方法5-8个分数点,且具备无偏奖励与收敛的理论保证。CEC-Zero为鲁棒、可扩展的中文拼写纠错建立了无需标注的新范式,释放了大语言模型在噪声文本处理流程中的潜力。