Large language models (LLMs) are increasingly popular but are also prone to generating bias, toxic or harmful language, which can have detrimental effects on individuals and communities. Although most efforts is put to assess and mitigate toxicity in generated content, it is primarily concentrated on English, while it's essential to consider other languages as well. For addressing this issue, we create and release FrenchToxicityPrompts, a dataset of 50K naturally occurring French prompts and their continuations, annotated with toxicity scores from a widely used toxicity classifier. We evaluate 14 different models from four prevalent open-sourced families of LLMs against our dataset to assess their potential toxicity across various dimensions. We hope that our contribution will foster future research on toxicity detection and mitigation beyond Englis
翻译:大型语言模型(LLMs)日益普及,但也容易产生偏见、毒性或有害语言,这可能对个人和社区产生不利影响。尽管当前大部分努力集中于评估和减轻生成内容中的毒性,但这些工作主要集中于英语,而考虑其他语言也至关重要。为解决这一问题,我们创建并发布了FrenchToxicityPrompts,这是一个包含5万个自然产生的法语提示及其续写的数据集,并使用广泛采用的毒性分类器进行了毒性评分标注。我们基于该数据集评估了来自四个主流开源LLM家族的14个不同模型,以衡量它们在多个维度上的潜在毒性。我们希望我们的贡献能够推动未来在英语之外的语言中进行毒性检测和减轻的相关研究。