In the age of advanced large language models (LLMs), the boundaries between human and AI-generated text are becoming increasingly blurred. We address the challenge of segmenting mixed-authorship text, that is identifying transition points in text where authorship shifts from human to AI or vice-versa, a problem with critical implications for authenticity, trust, and human oversight. We introduce a novel framework, called Info-Mask for mixed authorship detection that integrates stylometric cues, perplexity-driven signals, and structured boundary modeling to accurately segment collaborative human-AI content. To evaluate the robustness of our system against adversarial perturbations, we construct and release an adversarial benchmark dataset Mixed-text Adversarial setting for Segmentation (MAS), designed to probe the limits of existing detectors. Beyond segmentation accuracy, we introduce Human-Interpretable Attribution (HIA overlays that highlight how stylometric features inform boundary predictions, and we conduct a small-scale human study assessing their usefulness. Across multiple architectures, Info-Mask significantly improves span-level robustness under adversarial conditions, establishing new baselines while revealing remaining challenges. Our findings highlight both the promise and limitations of adversarially robust, interpretable mixed-authorship detection, with implications for trust and oversight in human-AI co-authorship.
翻译:在先进大语言模型(LLM)时代,人类与人工智能生成文本之间的界限日益模糊。我们致力于解决混合作者文本的分割挑战,即识别文本中作者身份从人类转向人工智能(或反之)的过渡点,该问题对真实性、信任度及人类监督具有关键影响。我们提出了一种名为Info-Mask的新型混合作者检测框架,该框架整合了风格计量线索、困惑度驱动信号和结构化边界建模,以精确分割人机协作内容。为评估系统对抗对抗性扰动的鲁棒性,我们构建并发布了对抗性基准数据集——混合文本对抗性分割场景(MAS),旨在探索现有检测器的性能极限。除分割准确性外,我们引入了人类可解释归因(HIA)叠加技术,通过高亮显示风格计量特征如何影响边界预测,并开展小规模人类研究评估其有效性。在多种架构测试中,Info-Mask显著提升了对抗条件下跨度级检测的鲁棒性,建立了新的性能基准,同时揭示了尚存的挑战。我们的研究结果既彰显了对抗性鲁棒且可解释的混合作者检测技术的潜力,也揭示了其局限性,这对人机协同创作中的信任与监督机制具有重要启示。