In the era of large language models generating high quality texts, it is a necessity to develop methods for detection of machine-generated text to avoid harmful use or simply due to annotation purposes. It is, however, also important to properly evaluate and compare such developed methods. Recently, a few benchmarks have been proposed for this purpose; however, integration of newest detection methods is rather challenging, since new methods appear each month and provide slightly different evaluation pipelines. In this paper, we present the IMGTB framework, which simplifies the benchmarking of machine-generated text detection methods by easy integration of custom (new) methods and evaluation datasets. Its configurability and flexibility makes research and development of new detection methods easier, especially their comparison to the existing state-of-the-art detectors. The default set of analyses, metrics and visualizations offered by the tool follows the established practices of machine-generated text detection benchmarking found in state-of-the-art literature.
翻译:在大语言模型生成高质量文本的时代,开发机器生成文本检测方法以避免有害使用或仅出于标注目的已成为必要。然而,合理评估和比较这些已开发方法同样至关重要。近期,已有若干基准测试为此目的被提出;然而,最新检测方法的集成颇具挑战,因为新方法每月涌现且提供略有差异的评估流程。本文提出IMGTB框架,通过简化自定义(新)方法与评估数据集的集成,降低机器生成文本检测方法的基准测试难度。其可配置性与灵活性使新检测方法的研究与开发更为便捷,尤其便于与现有最先进检测器进行对比。该工具提供的默认分析集、度量指标与可视化方案,遵循了最新文献中机器生成文本检测基准测试的成熟实践。