Factual error correction (FEC) aims to revise factual errors in false claims with minimal editing, making them faithful to the provided evidence. This task is crucial for alleviating the hallucination problem encountered by large language models. Given the lack of paired data (i.e., false claims and their corresponding correct claims), existing methods typically adopt the mask-then-correct paradigm. This paradigm relies solely on unpaired false claims and correct claims, thus being referred to as distantly supervised methods. These methods require a masker to explicitly identify factual errors within false claims before revising with a corrector. However, the absence of paired data to train the masker makes accurately pinpointing factual errors within claims challenging. To mitigate this, we propose to improve FEC by Learning to Inject Factual Errors (LIFE), a three-step distantly supervised method: mask-corrupt-correct. Specifically, we first train a corruptor using the mask-then-corrupt procedure, allowing it to deliberately introduce factual errors into correct text. The corruptor is then applied to correct claims, generating a substantial amount of paired data. After that, we filter out low-quality data, and use the remaining data to train a corrector. Notably, our corrector does not require a masker, thus circumventing the bottleneck associated with explicit factual error identification. Our experiments on a public dataset verify the effectiveness of LIFE in two key aspects: Firstly, it outperforms the previous best-performing distantly supervised method by a notable margin of 10.59 points in SARI Final (19.3% improvement). Secondly, even compared to ChatGPT prompted with in-context examples, LIFE achieves a superiority of 7.16 points in SARI Final.
翻译:事实错误校正(FEC)旨在用最小编辑量修正错误声明中的事实错误,使其与提供的证据保持一致。该任务对于缓解大型语言模型遇到的幻觉问题至关重要。由于缺乏配对数据(即错误声明及其对应的正确声明),现有方法通常采用“掩码-然后-校正”范式。该范式仅依赖非配对的错误声明和正确声明,因此被称为远程监督方法。这些方法需要一个掩码器来显式识别错误声明中的事实错误,然后再用校正器进行修正。然而,缺乏配对数据来训练掩码器使得准确定位声明中的事实错误具有挑战性。为此,我们提出通过学习注入事实错误(LIFE)来改进FEC,这是一种三步远程监督方法:掩码-破坏-校正。具体而言,我们首先使用“掩码-然后-破坏”过程训练一个破坏器,使其能够故意将事实错误引入正确文本。然后将破坏器应用于正确声明,生成大量配对数据。随后,我们过滤掉低质量数据,并使用剩余数据训练校正器。值得注意的是,我们的校正器不需要掩码器,从而绕过了与显式事实错误识别相关的瓶颈。我们在公开数据集上的实验从两个关键方面验证了LIFE的有效性:首先,它在SARI最终分数上以10.59分的显著优势超越了先前表现最佳的远程监督方法(提升19.3%)。其次,即使与带有上下文示例提示的ChatGPT相比,LIFE在SARI最终分数上仍实现了7.16分的优势。