The wide usage of LLMs raises critical requirements on detecting AI participation in texts. Existing studies investigate these detections in scattered contexts, leaving a systematic and unified approach unexplored. In this paper, we present HART, a hierarchical framework of AI risk levels, each corresponding to a detection task. To address these tasks, we propose a novel 2D Detection Method, decoupling a text into content and language expression. Our findings show that content is resistant to surface-level changes, which can serve as a key feature for detection. Experiments demonstrate that 2D method significantly outperforms existing detectors, achieving an AUROC improvement from 0.705 to 0.849 for level-2 detection and from 0.807 to 0.886 for RAID. We release our data and code at https://github.com/baoguangsheng/truth-mirror.
翻译:大型语言模型(LLM)的广泛使用对检测文本中AI的参与提出了关键需求。现有研究在分散的背景下探讨这些检测任务,尚未探索系统且统一的方法。本文提出HART,一个AI风险等级的分层框架,每个等级对应一项检测任务。为应对这些任务,我们提出一种新颖的二维检测方法,将文本解耦为内容和语言表达两部分。我们的研究发现,内容对表层变化具有抵抗力,可作为检测的关键特征。实验表明,二维方法显著优于现有检测器,在二级检测任务中将AUROC从0.705提升至0.849,在RAID任务中从0.807提升至0.886。我们在https://github.com/baoguangsheng/truth-mirror发布了数据和代码。