Caution: this paper may include material that could be offensive or distressing. The advent of Large Language Models (LLMs) necessitates the development of training approaches that mitigate the generation of unethical language and aptly manage toxic user queries. Given the challenges related to human labor and the scarcity of data, we present KoTox, comprising 39K unethical instruction-output pairs. This collection of automatically generated toxic instructions refines the training of LLMs and establishes a foundational framework for improving LLMs' ethical awareness and response to various toxic inputs, promoting more secure and responsible interactions in Natural Language Processing (NLP) applications.
翻译:注意:本文可能包含令人不适或具有冒犯性的内容。大型语言模型(LLM)的出现,使得开发能够减少不道德语言生成并妥善处理恶意用户查询的训练方法变得至关重要。针对人力成本高昂及数据稀缺的问题,我们提出了包含39K条不道德指令-输出对的KoTox数据集。该自动生成的毒性指令集合优化了大型语言模型的训练过程,并建立了提升模型伦理意识及对各类毒性输入响应能力的基础框架,从而促进自然语言处理(NLP)应用中更安全、更负责任的对话交互。