With the increasing integration of large language models (LLMs) into open-domain writing, detecting machine-generated text has become a critical task for ensuring content authenticity and trust. Existing approaches rely on statistical discrepancies or model-specific heuristics to distinguish between LLM-generated and human-written text. However, these methods struggle in real-world scenarios due to limited generalization, vulnerability to paraphrasing, and lack of explainability, particularly when facing stylistic diversity or hybrid human-AI authorship. In this work, we propose StyleDecipher, a robust and explainable detection framework that revisits LLM-generated text detection using combined feature extractors to quantify stylistic differences. By jointly modeling discrete stylistic indicators and continuous stylistic representations derived from semantic embeddings, StyleDecipher captures distinctive style-level divergences between human and LLM outputs within a unified representation space. This framework enables accurate, explainable, and domain-agnostic detection without requiring access to model internals or labeled segments. Extensive experiments across five diverse domains, including news, code, essays, reviews, and academic abstracts, demonstrate that StyleDecipher consistently achieves state-of-the-art in-domain accuracy. Moreover, in cross-domain evaluations, it surpasses existing baselines by up to 36.30%, while maintaining robustness against adversarial perturbations and mixed human-AI content. Further qualitative and quantitative analysis confirms that stylistic signals provide explainable evidence for distinguishing machine-generated text. Our source code can be accessed at https://github.com/SiyuanLi00/StyleDecipher.
翻译:随着大型语言模型(LLM)在开放域写作中的日益普及,检测机器生成文本已成为确保内容真实性与可信度的关键任务。现有方法依赖统计差异或模型特定启发式规则来区分LLM生成文本与人类撰写文本。然而,这些方法在现实场景中面临泛化能力有限、易受改写攻击、缺乏可解释性等挑战,尤其在处理风格多样性或人机混合创作内容时表现不佳。本研究提出StyleDecipher——一个鲁棒且可解释的检测框架,通过组合特征提取器量化风格差异,重新审视LLM生成文本检测问题。该框架通过联合建模离散风格指标与语义嵌入衍生的连续风格表征,在统一表征空间内捕获人类与LLM输出之间独特的风格级差异。该框架无需访问模型内部参数或标注片段即可实现精准、可解释且领域无关的检测。在涵盖新闻、代码、论文、评论和学术摘要的五个多样化领域中进行大规模实验,结果表明StyleDecipher在领域内检测中持续达到最先进准确率。在跨领域评估中,其性能超越现有基线方法最高达36.30%,同时保持对对抗性扰动和混合人机内容的鲁棒性。进一步的定性与定量分析证实,风格信号为区分机器生成文本提供了可解释的证据。源代码可通过https://github.com/SiyuanLi00/StyleDecipher获取。