Investigating Novice Researchers' Perceptions of Research Privacy Within LLM-Assisted Workflows

Large Language Model (LLMs)-assisted scholarly workflows introduce critical privacy and intellectual property risks. As a uniquely vulnerable cohort driven by publication pressure and a lack of institutional support, novice researchers rely heavily on public LLMs, compelling them to navigate high-stakes privacy-publication trade-offs. To investigate these concerns, we conducted semi-structured interviews with 44 researchers across diverse disciplines. Our findings reveal that the fear of idea leakage paradoxically accelerates, rather than deters, reliance on LLMs, as researchers utilize them to expedite publication. They also held misconceptions that their ideas lacked the unique value to attract targeted attacks, and that their inputs would be safely diluted within massive datasets, preventing reconstruction. From interviews, we identified five types of mitigations including input fragmentation and adversarial probing, though we found that participants largely perceived these measures as ineffective. We outline implications including implementing institution-level sandboxed isolation, scenario-based privacy pedagogy, and verifiable data-deletion audits for transparency.

翻译：大语言模型（LLM）辅助的学术工作流程引入了关键的隐私和知识产权风险。作为受出版压力驱动且缺乏机构支持的特殊弱势群体，初级研究者高度依赖公共LLM，被迫在隐私与出版之间进行高风险权衡。为探究这些问题，我们对来自不同学科的44位研究者进行了半结构化访谈。研究结果显示，对创意泄露的恐惧反而加速而非阻止了研究者对LLM的依赖——他们利用这些工具加快出版进程。研究者普遍存在认知偏差，认为自己的创意缺乏独特价值不足以成为针对性攻击目标，且输入数据会在海量数据集中安全稀释而无法被重建。通过访谈，我们识别出五种缓解措施，包括输入分段和对抗性探测，但参与者普遍认为这些方法效果有限。我们进一步提出实施机构级沙箱隔离、基于场景的隐私教学法，以及可验证的数据删除审计以提升透明度等建议。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

《缓解大语言模型（LLMs）幻觉：面向应用的检索增强生成（RAG）、推理与智能体系统综述》

专知会员服务

24+阅读 · 2025年10月29日

LLM/智能体作为数据分析师：综述

专知会员服务

38+阅读 · 2025年9月30日

可信赖LLM智能体的研究综述：威胁与应对措施

专知会员服务

36+阅读 · 2025年3月17日

利用多个大型语言模型：关于LLM集成的调研

专知会员服务

35+阅读 · 2025年2月27日