Large language models (LLMs) have gained popularity in various fields for their exceptional capability of generating human-like text. Their potential misuse has raised social concerns about plagiarism in academic contexts. However, effective artificial scientific text detection is a non-trivial task due to several challenges, including 1) the lack of a clear understanding of the differences between machine-generated and human-written scientific text, 2) the poor generalization performance of existing methods caused by out-of-distribution issues, and 3) the limited support for human-machine collaboration with sufficient interpretability during the detection process. In this paper, we first identify the critical distinctions between machine-generated and human-written scientific text through a quantitative experiment. Then, we propose a mixed-initiative workflow that combines human experts' prior knowledge with machine intelligence, along with a visual analytics prototype to facilitate efficient and trustworthy scientific text detection. Finally, we demonstrate the effectiveness of our approach through two case studies and a controlled user study with proficient researchers. We also provide design implications for interactive artificial text detection tools in high-stakes decision-making scenarios.
翻译:大型语言模型(LLMs)因其生成类人文本的卓越能力,在各领域日益普及。其潜在滥用引发了学术界对抄袭问题的社会关切。然而,有效的人工智能科学文本检测面临多重挑战,包括:1)对机器生成与人类撰写科学文本差异缺乏清晰认知;2)现有方法因分布外问题导致泛化性能不足;3)检测过程中缺乏具备充分可解释性的人机协作支持。本文首先通过定量实验揭示机器生成与人类撰写科学文本的关键差异;其次提出融合人类专家先验知识与机器智能的混合主动工作流,并开发可视化分析原型以促进高效可信的科学文本检测;最后通过两项案例研究与一项针对熟练研究人员的受控用户实验,验证了方法的有效性。针对高风险决策场景下的交互式人工智能文本检测工具,我们提出了设计启示。