RAZOR: Sharpening Knowledge by Cutting Bias with Unsupervised Text Rewriting

Despite the widespread use of LLMs due to their superior performance in various tasks, their high computational costs often lead potential users to opt for the pretraining-finetuning pipeline. However, biases prevalent in manually constructed datasets can introduce spurious correlations between tokens and labels, creating so-called shortcuts and hindering the generalizability of fine-tuned models. Existing debiasing methods often rely on prior knowledge of specific dataset biases, which is challenging to acquire a priori. We propose RAZOR (Rewriting And Zero-bias Optimization Refinement), a novel, unsupervised, and data-focused debiasing approach based on text rewriting for shortcut mitigation. RAZOR leverages LLMs to iteratively rewrite potentially biased text segments by replacing them with heuristically selected alternatives in a shortcut space defined by token statistics and positional information. This process aims to align surface-level text features more closely with diverse label distributions, thereby promoting the learning of genuine linguistic patterns. Compared with unsupervised SoTA models, RAZOR improves by 3.5% on the FEVER and 6.5% on MNLI and SNLI datasets according to the F1 score. Additionally, RAZOR effectively mitigates specific known biases, reducing bias-related terms by x2 without requiring prior bias information, a result that is on par with SoTA models that leverage prior information. Our work prioritizes data manipulation over architectural modifications, emphasizing the pivotal role of data quality in enhancing model performance and fairness. This research contributes to developing more robust evaluation benchmarks for debiasing methods by incorporating metrics for bias reduction and overall model efficacy.

翻译：尽管大型语言模型（LLM）因其在各种任务中的卓越性能而被广泛使用，但其高昂的计算成本常使潜在用户选择预训练-微调流程。然而，人工构建数据集中普遍存在的偏误可能在词元与标签之间引入虚假关联，形成所谓的“捷径”，从而损害微调模型的泛化能力。现有的去偏方法通常依赖于对特定数据集偏误的先验知识，而这在事先难以获取。我们提出RAZOR（Rewriting And Zero-bias Optimization Refinement），一种基于文本重写的、新颖的、无监督的、以数据为中心的去偏方法，旨在缓解捷径问题。RAZOR利用LLM迭代重写可能带有偏误的文本片段，通过在由词元统计量与位置信息定义的捷径空间中，用启发式选择的替代内容进行替换。该过程旨在使表层文本特征更贴近多样化的标签分布，从而促进对真实语言模式的学习。与无监督的先进模型相比，根据F1分数，RAZOR在FEVER数据集上提升了3.5%，在MNLI和SNLI数据集上提升了6.5%。此外，RAZOR有效缓解了特定的已知偏误，在无需先验偏误信息的情况下，将偏误相关项减少了x2倍，其效果与利用先验信息的先进模型相当。我们的工作优先考虑数据操作而非架构修改，强调了数据质量在提升模型性能与公平性中的关键作用。本研究通过结合偏误削减与整体模型效能的度量指标，为开发更鲁棒的去偏方法评估基准做出了贡献。