Toxic text classification for online moderation remains challenging under extreme class imbalance, where rare but high-risk labels such as threat and severe_toxic are consistently underdetected by conventional models. We propose CoGate-LSTM, a parameter-efficient recurrent architecture built around a novel cosine-similarity feature gating mechanism that adaptively rescales token embeddings by their directional similarity to a learned toxicity prototype. Unlike token-position attention, the gate emphasizes feature directions most informative for minority toxic classes. The model combines frozen multi-source embeddings (GloVe, FastText, and BERT-CLS), a character-level BiLSTM, embedding-space SMOTE, and weighted focal loss. On the Jigsaw Toxic Comment benchmark, CoGate-LSTM achieves 0.881 macro-F1 (95% CI: [0.873, 0.889]) and 96.0% accuracy, outperforming fine-tuned BERT by 6.9 macro-F1 points (p < 0.001) and XGBoost by 4.7, while using only 7.3M parameters (about 15$\times$ fewer than BERT) and 48 ms CPU inference latency. Gains are strongest on minority labels, with F1 improvements of +71% for severe_toxic, +33% for threat, and +28% for identity_hate relative to fine-tuned BERT. Ablations identify cosine gating as the primary driver of performance (-4.8 macro-F1 when removed), with additional benefits from character-level fusion (-2.4) and multi-head attention (-2.9). CoGate-LSTM also transfers reasonably across datasets, reaching a 0.71 macro-F1 zero-shot on the Contextual Abuse Dataset and 0.73 with lightweight threshold adaptation. These results show that direction-aware feature gating offers an effective and efficient alternative to large, fully fine-tuned transformers for classifying imbalanced toxic comments.
翻译:面向在线内容审核的有毒文本分类在极端类别不平衡条件下仍具挑战性,其中威胁(threat)严重毒害(severe_toxic)等高风险稀有标签常被传统模型漏检。我们提出CoGate-LSTM,这是一种参数高效的循环架构,其核心是基于余弦相似度的特征门控机制,通过将词元嵌入与学习到的毒性原型进行方向相似性比较,自适应地重新缩放词元嵌入。与词元位置注意力不同,该门控机制侧重对少数有毒类别最具区分力的特征方向。模型融合了冻结的多源嵌入(GloVe、FastText与BERT-CLS)、字符级双向LSTM、嵌入空间SMOTE以及加权焦点损失。在Jigsaw有毒评论基准上,CoGate-LSTM取得0.881宏F1值(95%置信区间:[0.873, 0.889])与96.0%准确率,较微调BERT提升6.9个宏F1百分点(p < 0.001),较XGBoost提升4.7个宏F1百分点,而参数量仅为BERT的约1/15(730万参数),CPU推理延迟仅48毫秒。效果提升集中在少数标签类:相对于微调BERT,严重毒害类F1提升71%,威胁类提升33%,仇恨身份类提升28%。消融实验证实余弦门控是性能提升的主要驱动因素(移除后宏F1下降4.8),字符级融合(下降2.4)与多头注意力(下降2.9)亦贡献额外增益。CoGate-LSTM具备理想的跨数据集迁移能力,在上下文滥用数据集上零样本迁移宏F1达0.71,经轻量阈值自适应调整后达0.73。这些结果表明,方向感知特征门控为不平衡有毒评论分类提供了有效且高效的替代方案,其性能可比肩大型全微调Transformer模型。