Towards Detecting AI-Generated Text within Human-AI Collaborative Hybrid Texts

This study explores the challenge of sentence-level AI-generated text detection within human-AI collaborative hybrid texts. Existing studies of AI-generated text detection for hybrid texts often rely on synthetic datasets. These typically involve hybrid texts with a limited number of boundaries. We contend that studies of detecting AI-generated content within hybrid texts should cover different types of hybrid texts generated in realistic settings to better inform real-world applications. Therefore, our study utilizes the CoAuthor dataset, which includes diverse, realistic hybrid texts generated through the collaboration between human writers and an intelligent writing system in multi-turn interactions. We adopt a two-step, segmentation-based pipeline: (i) detect segments within a given hybrid text where each segment contains sentences of consistent authorship, and (ii) classify the authorship of each identified segment. Our empirical findings highlight (1) detecting AI-generated sentences in hybrid texts is overall a challenging task because (1.1) human writers' selecting and even editing AI-generated sentences based on personal preferences adds difficulty in identifying the authorship of segments; (1.2) the frequent change of authorship between neighboring sentences within the hybrid text creates difficulties for segment detectors in identifying authorship-consistent segments; (1.3) the short length of text segments within hybrid texts provides limited stylistic cues for reliable authorship determination; (2) before embarking on the detection process, it is beneficial to assess the average length of segments within the hybrid text. This assessment aids in deciding whether (2.1) to employ a text segmentation-based strategy for hybrid texts with longer segments, or (2.2) to adopt a direct sentence-by-sentence classification strategy for those with shorter segments.

翻译：本研究探索了在人机协作混合文本中，于句子层面检测AI生成内容所面临的挑战。现有针对混合文本中AI生成内容检测的研究，通常依赖合成数据集。这些数据集往往包含边界数量有限的混合文本。我们认为，对混合文本中AI生成内容检测的研究，应涵盖真实场景下生成的不同类型混合文本，以更好地服务于实际应用。为此，本研究采用了CoAuthor数据集，该数据集包含通过人类作者与智能写作系统在多轮交互中协作生成的多样化、真实混合文本。我们采用基于分割的两步法流程：(i) 检测给定混合文本中的片段，每个片段包含作者一致的句子；(ii) 分类每个识别片段的作者归属。我们的实证研究结果强调：(1) 检测混合文本中AI生成句子总体具有挑战性，因为(1.1) 人类作者基于个人偏好选择和甚至编辑AI生成句子，增加了识别片段作者归属的难度；(1.2) 混合文本中相邻句子作者归属的频繁变化，给片段检测器识别作者一致的片段带来困难；(1.3) 混合文本中文本片段长度较短，为可靠的作者归属判断提供的风格线索有限；(2) 在启动检测流程前，评估混合文本中片段的平均长度是有益的。这一评估有助于决定：(2.1) 对较长片段的混合文本采用基于文本分割的策略，或(2.2) 对较短片段的混合文本采用直接的逐句分类策略。