To date, toxicity mitigation in language models has almost entirely been focused on single-language settings. As language models embrace multilingual capabilities, it's crucial our safety measures keep pace. Recognizing this research gap, our approach expands the scope of conventional toxicity mitigation to address the complexities presented by multiple languages. In the absence of sufficient annotated datasets across languages, we employ translated data to evaluate and enhance our mitigation techniques. We also compare finetuning mitigation approaches against retrieval-augmented techniques under both static and continual toxicity mitigation scenarios. This allows us to examine the effects of translation quality and the cross-lingual transfer on toxicity mitigation. We also explore how model size and data quantity affect the success of these mitigation efforts. Covering nine languages, our study represents a broad array of linguistic families and levels of resource availability, ranging from high to mid-resource languages. Through comprehensive experiments, we provide insights into the complexities of multilingual toxicity mitigation, offering valuable insights and paving the way for future research in this increasingly important field. Code and data are available at https://github.com/for-ai/goodtriever.
翻译:迄今为止,语言模型的毒性缓解研究几乎完全集中于单语言场景。随着语言模型日益具备多语言能力,我们的安全措施也必须同步发展。认识到这一研究空白,我们的方法将传统毒性缓解的范围扩展至应对多语言带来的复杂性。在缺乏充足跨语言标注数据集的情况下,我们采用翻译数据来评估和改进缓解技术。我们还在静态与持续毒性缓解两种场景下,对微调缓解方法与检索增强技术进行了比较。这使得我们能够考察翻译质量与跨语言迁移对毒性缓解的影响。同时,我们探索了模型规模与数据量如何影响这些缓解措施的效果。本研究涵盖九种语言,涉及从高资源到中资源的多样化语系及资源可用性水平。通过全面实验,我们揭示了多语言毒性缓解的复杂性,提供了有价值的见解,并为这一日益重要领域的未来研究铺平了道路。代码与数据可在 https://github.com/for-ai/goodtriever 获取。