Online abuse has grown increasingly complex, spanning toxic language, harassment, manipulation, and fraudulent behavior. Traditional machine-learning approaches dependent on static classifiers and labor-intensive labeling struggle to keep pace with evolving threat patterns and nuanced policy requirements. Large Language Models introduce new capabilities for contextual reasoning, policy interpretation, explanation generation, and cross-modal understanding, enabling them to support multiple stages of modern safety systems. This survey provides a lifecycle-oriented analysis of how LLMs are being integrated into the Abuse Detection Lifecycle (ADL), which we define across four stages: (I) Label \& Feature Generation, (II) Detection, (III) Review \& Appeals, and (IV) Auditing \& Governance. For each stage, we synthesize emerging research and industry practices, highlight architectural considerations for production deployment, and examine the strengths and limitations of LLM-driven approaches. We conclude by outlining key challenges including latency, cost-efficiency, determinism, adversarial robustness, and fairness and discuss future research directions needed to operationalize LLMs as reliable, accountable components of large-scale abuse-detection and governance systems.
翻译:在线滥用行为日益复杂,涵盖有毒语言、骚扰、操纵及欺诈行为。依赖静态分类器和劳动密集型标注的传统机器学习方法难以跟上不断演变的威胁模式与细化的政策需求。大语言模型引入了上下文推理、政策解读、解释生成及跨模态理解的新能力,使其能够支撑现代安全系统的多个阶段。本文提供了一种面向生命周期的分析视角,探讨大语言模型如何被整合至我们定义的滥用检测生命周期中,该生命周期包含四个阶段:(I)标签与特征生成、(II)检测、(III)审查与申诉、以及(IV)审计与治理。针对每个阶段,我们综合了新兴研究与行业实践,突出了生产部署中的架构考量,并审视了大语言模型驱动方法的优势与局限。最后,我们概述了关键挑战(包括延迟、成本效益、确定性、对抗鲁棒性及公平性),并讨论了使大语言模型成为大规模滥用检测与治理系统中可靠且负责任组件所需的未来研究方向。