Pornographic content occurring in human-machine interaction dialogues can cause severe side effects for users in open-domain dialogue systems. However, research on detecting pornographic language within human-machine interaction dialogues is an important subject that is rarely studied. To advance in this direction, we introduce CensorChat, a dialogue monitoring dataset aimed at detecting whether the dialogue session contains pornographic content. To this end, we collect real-life human-machine interaction dialogues in the wild and break them down into single utterances and single-turn dialogues, with the last utterance spoken by the chatbot. We propose utilizing knowledge distillation of large language models to annotate the dataset. Specifically, first, the raw dataset is annotated by four open-source large language models, with the majority vote determining the label. Second, we use ChatGPT to update the empty label from the first step. Third, to ensure the quality of the validation and test sets, we utilize GPT-4 for label calibration. If the current label does not match the one generated by GPT-4, we employ a self-criticism strategy to verify its correctness. Finally, to facilitate the detection of pornographic text, we develop a series of text classifiers using a pseudo-labeled dataset. Detailed data analysis demonstrates that leveraging knowledge distillation techniques with large language models provides a practical and cost-efficient method for developing pornographic text detectors.
翻译:人机交互对话中出现的色情内容可能对开放域对话系统的用户造成严重负面影响。然而,针对人机交互对话中色情语言检测的研究鲜有涉及。为推动该方向的发展,我们提出CensorChat检测数据集,旨在判断对话轮次是否包含色情内容。为此,我们收集真实环境中的人机交互对话,将其拆分为单轮语句和单轮对话,其中最后一轮语句由聊天机器人生成。我们提出利用大语言模型的知识蒸馏技术对数据集进行标注。具体流程为:首先,使用四个开源大语言模型对原始数据集进行标注,通过多数投票确定标签;其次,利用ChatGPT更新第一步中的空标签;第三,为确保验证集和测试集的质量,采用GPT-4进行标签校准。若当前标签与GPT-4生成的标签不一致,则通过自我批评策略验证其正确性。最后,为促进色情文本检测,我们基于伪标注数据集开发了一系列文本分类器。详细的数据分析表明,利用大语言模型的知识蒸馏技术为开发色情文本检测器提供了一种实用且经济高效的方法。