HC3 Plus: A Semantic-Invariant Human ChatGPT Comparison Corpus

ChatGPT has gained significant interest due to its impressive performance, but people are increasingly concerned about its potential risks, particularly around the detection of AI-generated content (AIGC), which is often difficult for untrained humans to identify. Current datasets utilized for detecting ChatGPT-generated text primarily center around question-answering, yet they tend to disregard tasks that possess semantic-invariant properties, such as summarization, translation, and paraphrasing. Our primary studies demonstrate that detecting model-generated text on semantic-invariant tasks is more difficult. To fill this gap, we introduce a more extensive and comprehensive dataset that considers more types of tasks than previous work, including semantic-invariant tasks. In addition, the model after a large number of task instruction fine-tuning shows a strong powerful performance. Owing to its previous success, we further instruct fine-tuning Tk-instruct and built a more powerful detection system. Experimental results show that our proposed detector outperforms the previous state-of-the-art RoBERTa-based detector.

翻译：ChatGPT因其卓越性能引起了广泛关注，但人们对其潜在风险日益担忧，尤其是在人工智能生成内容（AIGC）的检测方面——这一内容通常未经训练的普通人难以识别。当前用于检测ChatGPT生成文本的数据集主要聚焦于问答任务，却往往忽视了具有语义不变属性的任务（如摘要、翻译和改写）。我们的初步研究表明，在语义不变任务中检测模型生成文本更为困难。为弥补这一空白，我们引入了一个更广泛、更全面的数据集，涵盖了比先前工作更多类型的任务，包括语义不变任务。此外，经过大量任务指令微调后的模型展现出强大的性能。鉴于其先前取得的成功，我们进一步对Tk-instruct进行了指令微调，构建了一个更强大的检测系统。实验结果表明，我们提出的检测器优于先前基于RoBERTa的最先进检测器。

相关内容

Performance

关注 3

Performance：International Symposium on Computer Performance Modeling, Measurements and Evaluation。 Explanation：计算机性能建模、测量和评估国际研讨会。 Publisher：ACM。 SIT：http://dblp.uni-trier.de/db/conf/performance/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日