Online communication increasingly amplifies toxic language, and recent research actively explores methods for detecting and rewriting such content. Existing studies primarily focus on non-obfuscated text, which limits robustness in the situation where users intentionally disguise toxic expressions. In particular, Korean allows toxic expressions to be easily disguised through its agglutinative characteristic. However, obfuscation in Korean remains largely unexplored, which motivates us to introduce a KOTOX: Korean toxic dataset for deobfuscation and detoxification. We categorize Korean obfuscation patterns into linguistically grounded classes and define transformation rules derived from real-world examples. Using these rules, we provide paired neutral and toxic sentences alongside their obfuscated counterparts. Models trained on our dataset better handle obfuscated text without sacrificing performance on non-obfuscated text. This is the first dataset that simultaneously supports deobfuscation and detoxification for the Korean language. We expect it to facilitate better understanding and mitigation of obfuscated toxic content in LLM for Korean. Our code and data are available at https://github.com/leeyejin1231/KOTOX.


翻译:在线交流日益放大了毒性语言,近期研究正积极探索检测与重写此类内容的方法。现有研究主要集中于非混淆文本,这在用户故意伪装毒性表达的情况下限制了模型的鲁棒性。特别是韩语凭借其黏着特性,使得毒性表达易于被伪装。然而,韩语中的混淆现象在很大程度上仍未得到探索,这促使我们引入KOTOX:一个用于去混淆与去毒化的韩语毒性数据集。我们将韩语混淆模式归类为基于语言学的类别,并根据真实案例定义了转换规则。利用这些规则,我们提供了成对的中性与毒性句子及其对应的混淆版本。基于我们数据集训练的模型能更好地处理混淆文本,同时不牺牲其在非混淆文本上的性能。这是首个同时支持韩语去混淆与去毒化的数据集。我们期望它能促进对韩语大语言模型中混淆毒性内容的理解与缓解。我们的代码与数据可在 https://github.com/leeyejin1231/KOTOX 获取。

0
下载
关闭预览

相关内容

数据集,又称为资料集、数据集合或资料集合,是一种由数据所组成的集合。
Data set(或dataset)是一个数据的集合,通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量,如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数,该数据集的数据可能包括一个或多个成员。
Top
微信扫码咨询专知VIP会员