To date, toxicity mitigation in language models has almost entirely been focused on single-language settings. As language models embrace multilingual capabilities, it's crucial our safety measures keep pace. Recognizing this research gap, our approach expands the scope of conventional toxicity mitigation to address the complexities presented by multiple languages. In the absence of sufficient annotated datasets across languages, we employ translated data to evaluate and enhance our mitigation techniques. We also compare finetuning mitigation approaches against retrieval-augmented techniques under both static and continual toxicity mitigation scenarios. This allows us to examine the effects of translation quality and the cross-lingual transfer on toxicity mitigation. We also explore how model size and data quantity affect the success of these mitigation efforts. Covering nine languages, our study represents a broad array of linguistic families and levels of resource availability, ranging from high to mid-resource languages. Through comprehensive experiments, we provide insights into the complexities of multilingual toxicity mitigation, offering valuable insights and paving the way for future research in this increasingly important field. Code and data are available at https://github.com/for-ai/goodtriever.
翻译:迄今为止,语言模型中的毒性缓解几乎完全集中于单语言环境。随着语言模型具备多语言能力,我们的安全措施必须跟上这一步伐。针对这一研究空白,我们的方法拓展了传统毒性缓解的范畴,以应对多种语言带来的复杂性。由于跨语言标注数据集不足,我们采用翻译数据来评估和优化缓解技术。我们还在静态与持续性毒性缓解场景下,对比了基于微调的缓解方法与基于检索增强的技术。这使得我们能够考察翻译质量与跨语言迁移对毒性缓解的影响。我们同时探讨了模型规模与数据量对缓解效果的影响。本研究涵盖九种语言,代表了广泛的语言家族与资源可用性层级(从高资源到中资源语言)。通过全面的实验,我们揭示了多语言毒性缓解的复杂性,为这一日益重要的领域提供了宝贵见解,并为未来研究奠定了基础。代码与数据详见https://github.com/for-ai/goodtriever。