Text detoxification is a conditional text generation task aiming to remove offensive content from toxic text. It is highly useful for online forums and social media, where offensive content is frequently encountered. Intuitively, there are diverse ways to detoxify sentences while preserving their meanings, and we can select from detoxified sentences before displaying text to users. Conditional diffusion models are particularly suitable for this task given their demonstrated higher generative diversity than existing conditional text generation models based on language models. Nonetheless, text fluency declines when they are trained with insufficient data, which is the case for this task. In this work, we propose DiffuDetox, a mixed conditional and unconditional diffusion model for text detoxification. The conditional model takes toxic text as the condition and reduces its toxicity, yielding a diverse set of detoxified sentences. The unconditional model is trained to recover the input text, which allows the introduction of additional fluent text for training and thus ensures text fluency. Extensive experimental results and in-depth analysis demonstrate the effectiveness of our proposed DiffuDetox.
翻译:文本净化是一项条件式文本生成任务,旨在移除有毒文本中的攻击性内容。该任务对于经常出现攻击性内容的在线论坛和社交媒体具有重要实用价值。直观而言,在保持语义的前提下,存在多种净化句子的方式,我们可以在向用户展示文本前从净化后的句子中进行筛选。条件扩散模型因其相比现有基于语言模型的条件文本生成模型具有更高的生成多样性,特别适合该任务。然而,当训练数据不足时(这正是该任务面临的困境),文本流畅性会下降。本文提出DiffuDetox——一种用于文本净化的混合条件与无条件扩散模型。条件模型以有毒文本为条件并降低其毒性,生成多样化的净化句子;无条件模型则被训练用于恢复输入文本,从而允许引入额外流畅文本进行训练并确保文本流畅性。广泛的实验结果与深入分析证明了所提出的DiffuDetox的有效性。