This paper introduces AIDetx, a novel method for detecting machine-generated text using data compression techniques. Traditional approaches, such as deep learning classifiers, often suffer from high computational costs and limited interpretability. To address these limitations, we propose a compression-based classification framework that leverages finite-context models (FCMs). AIDetx constructs distinct compression models for human-written and AI-generated text, classifying new inputs based on which model achieves a higher compression ratio. We evaluated AIDetx on two benchmark datasets, achieving F1 scores exceeding 97% and 99%, respectively, highlighting its high accuracy. Compared to current methods, such as large language models (LLMs), AIDetx offers a more interpretable and computationally efficient solution, significantly reducing both training time and hardware requirements (e.g., no GPUs needed). The full implementation is publicly available at https://github.com/AIDetx/AIDetx.
翻译:本文介绍了一种利用数据压缩技术检测机器生成文本的新方法AIDetx。传统方法(如深度学习分类器)常面临计算成本高和可解释性有限的问题。为克服这些局限,我们提出了一种基于有限上下文模型(FCMs)的压缩分类框架。AIDetx分别为人工撰写文本和AI生成文本构建独立的压缩模型,通过比较新输入在不同模型下的压缩比来实现分类。我们在两个基准数据集上评估了AIDetx,分别取得了超过97%和99%的F1分数,体现了其高精度特性。相较于当前方法(如大语言模型LLMs),AIDetx提供了更具可解释性且计算效率更高的解决方案,显著降低了训练时间和硬件需求(例如无需GPU)。完整实现已公开于https://github.com/AIDetx/AIDetx。