The widespread adoption of large language models (LLMs) has created an urgent need for robust tools to detect LLM-generated text, especially in light of \textit{paraphrasing} techniques that often evade existing detection methods. To address this challenge, we present a novel semantic-enhanced framework for detecting LLM-generated text (SEFD) that leverages a retrieval-based mechanism to fully utilize text semantics. Our framework improves upon existing detection methods by systematically integrating retrieval-based techniques with traditional detectors, employing a carefully curated retrieval mechanism that strikes a balance between comprehensive coverage and computational efficiency. We showcase the effectiveness of our approach in sequential text scenarios common in real-world applications, such as online forums and Q\&A platforms. Through comprehensive experiments across various LLM-generated texts and detection methods, we demonstrate that our framework substantially enhances detection accuracy in paraphrasing scenarios while maintaining robustness for standard LLM-generated content.
翻译:随着大语言模型(LLMs)的广泛采用,尤其是在现有检测方法常因\textit{改写}技术而失效的背景下,开发能够稳健检测LLM生成文本的工具变得尤为迫切。为应对这一挑战,我们提出了一种新颖的用于检测LLM生成文本的语义增强框架(SEFD),该框架利用基于检索的机制来充分利用文本语义。我们的框架通过系统地将基于检索的技术与传统检测器相结合,并采用一种精心设计的检索机制,在全面覆盖与计算效率之间取得平衡,从而改进了现有检测方法。我们在现实应用中常见的序列文本场景(如在线论坛和问答平台)中展示了该方法的有效性。通过对各种LLM生成文本和检测方法进行的全面实验,我们证明,该框架在显著提升改写场景下检测准确率的同时,对标准的LLM生成内容保持了良好的鲁棒性。