Toxic language is one of the major barrier to safe online participation, yet robust mitigation tools are scarce for African languages. This study addresses this critical gap by investigating automatic text detoxification (toxic to neutral rewriting) for two low-resource African languages, isiXhosa and Yorùbá. The work contributes a novel, pragmatic hybrid methodology: a lightweight, interpretable TF-IDF and Logistic Regression model for transparent toxicity detection, and a controlled lexicon- and token-guided rewriting component. A parallel corpus of toxic to neutral rewrites, which captures idiomatic usage, diacritics, and code switching, was developed to train and evaluate the model. The detection component achieved stratified K-fold accuracies of 61-72% (isiXhosa) and 72-86% (Yorùbá), with per-language ROC-AUCs up to 0.88. The rewriting component successfully detoxified all detected toxic sentences while preserving 100% of non-toxic sentences. These results demonstrate that scalable, interpretable machine learning detectors combined with rule-based edits offer a competitive and resource-efficient solution for culturally adaptive safety tooling, setting a new benchmark for low-resource Text Style Transfer (TST) in African languages.
翻译:有害语言是阻碍安全网络参与的主要障碍之一,然而针对非洲语言的稳健缓解工具却十分匮乏。本研究通过探索两种低资源非洲语言——isiXhosa语和Yorùbá语——的自动文本去毒化(将有害文本重写为中性文本),填补了这一关键空白。本工作提出了一种新颖实用的混合方法:采用轻量级、可解释的TF-IDF与逻辑回归模型实现透明化的毒性检测,并结合受控的词典引导与词元引导重写模块。为训练和评估模型,研究构建了包含习语用法、变音符号和语码转换的平行语料库,其中收录了从有害到中性的文本改写对。检测模块的分层K折准确率分别达到61-72%(isiXhosa语)和72-86%(Yorùbá语),各语言的ROC-AUC最高达0.88。重写模块成功实现了所有检测到有害句子的去毒化处理,同时100%保留了非有害句子。这些结果表明:可扩展、可解释的机器学习检测器与基于规则的编辑相结合,为文化适应性安全工具提供了具有竞争力且资源高效的解决方案,为非洲语言低资源文本风格迁移领域树立了新的基准。