Our research focuses on the crucial challenge of discerning text produced by Large Language Models (LLMs) from human-generated text, which holds significance for various applications. With ongoing discussions about attaining a model with such functionality, we present supporting evidence regarding the feasibility of such models. We evaluated our models on multiple datasets, including Twitter Sentiment, Football Commentary, Project Gutenberg, PubMedQA, and SQuAD, confirming the efficacy of the enhanced detection approaches. These datasets were sampled with intricate constraints encompassing every possibility, laying the foundation for future research. We evaluate GPT-3.5-Turbo against various detectors such as SVM, RoBERTa-base, and RoBERTa-large. Based on the research findings, the results predominantly relied on the sequence length of the sentence.
翻译:我们的研究聚焦于识别大语言模型(LLMs)生成文本与人类生成文本这一关键挑战,该技术对多种应用具有重要价值。针对持续探讨的能否实现具备此功能的模型这一问题,我们提供了支持性证据,证明此类模型的可行性。我们在多个数据集上评估了所提出的模型,包括Twitter情感分析、足球评论、Project Gutenberg、PubMedQA和SQuAD,证实了改进后检测方法的有效性。这些数据集在涵盖所有可能性的复杂约束条件下进行采样,为未来研究奠定了基础。我们对比评估了GPT-3.5-Turbo与SVM、RoBERTa-base、RoBERTa-large等多种检测器的性能。研究结果表明,检测效果主要取决于句子的序列长度。