The generation of undesirable and factually incorrect content of large language models poses a significant challenge and remains largely an unsolved issue. This paper studies the integration of a contrastive learning objective for fine-tuning LLMs for implicit knowledge editing and controlled text generation. Optimizing the training objective entails aligning text perplexities in a contrastive fashion. To facilitate training the model in a self-supervised fashion, we leverage an off-the-shelf LLM for training data generation. We showcase applicability in the domain of detoxification. Herein, the proposed approach leads to a significant decrease in the generation of toxic content while preserving general utility for downstream tasks such as commonsense reasoning and reading comprehension. The proposed approach is conceptually simple but empirically powerful.
翻译:大型语言模型生成不良及事实错误内容的问题构成重大挑战,且很大程度上仍未得到解决。本文研究将对比学习目标整合到微调大型语言模型中,以实现隐式知识编辑和可控文本生成。优化训练目标需要以对比方式对齐文本困惑度。为了以自监督方式训练模型,我们利用现成的大型语言模型生成训练数据。我们在去毒化领域展示了该方法的适用性。实验表明,所提出的方法在显著减少有毒内容生成的同时,保留了下游任务(如常识推理和阅读理解)的通用性能。该方法概念简单但在实证中表现强大。