Online communication increasingly amplifies toxic language, and recent research actively explores methods for detecting and rewriting such content. Existing studies primarily focus on non-obfuscated text, which limits robustness in the situation where users intentionally disguise toxic expressions. In particular, Korean allows toxic expressions to be easily disguised through its agglutinative characteristic. However, obfuscation in Korean remains largely unexplored, which motivates us to introduce a KOTOX: Korean toxic dataset for deobfuscation and detoxification. We categorize Korean obfuscation patterns into linguistically grounded classes and define transformation rules derived from real-world examples. Using these rules, we provide paired neutral and toxic sentences alongside their obfuscated counterparts. Models trained on our dataset better handle obfuscated text without sacrificing performance on non-obfuscated text. This is the first dataset that simultaneously supports deobfuscation and detoxification for the Korean language. We expect it to facilitate better understanding and mitigation of obfuscated toxic content in LLM for Korean. Our code and data are available at https://github.com/leeyejin1231/KOTOX.
翻译:在线交流日益放大了毒性语言,近期研究正积极探索检测与重写此类内容的方法。现有研究主要集中于非混淆文本,这在用户故意伪装毒性表达的情况下限制了模型的鲁棒性。特别是韩语凭借其黏着特性,使得毒性表达易于被伪装。然而,韩语中的混淆现象在很大程度上仍未得到探索,这促使我们引入KOTOX:一个用于去混淆与去毒化的韩语毒性数据集。我们将韩语混淆模式归类为基于语言学的类别,并根据真实案例定义了转换规则。利用这些规则,我们提供了成对的中性与毒性句子及其对应的混淆版本。基于我们数据集训练的模型能更好地处理混淆文本,同时不牺牲其在非混淆文本上的性能。这是首个同时支持韩语去混淆与去毒化的数据集。我们期望它能促进对韩语大语言模型中混淆毒性内容的理解与缓解。我们的代码与数据可在 https://github.com/leeyejin1231/KOTOX 获取。