Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, they also introduce novel vulnerabilities that demand careful consideration for safety. However, there exists a notable gap in the literature, as there has been no comprehensive exploration of these vulnerabilities. This position paper fills this gap by conducting a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures. We begin by providing a comprehensive overview of the potential risks inherent to scientific LLM agents, taking into account user intent, the specific scientific domain, and their potential impact on the external environment. Then, we delve into the origins of these vulnerabilities and provide a scoping review of the limited existing works. Based on our analysis, we propose a triadic framework involving human regulation, agent alignment, and an understanding of environmental feedback (agent regulation) to mitigate these identified risks. Furthermore, we highlight the limitations and challenges associated with safeguarding scientific agents and advocate for the development of improved models, robust benchmarks, and comprehensive regulations to address these issues effectively.
翻译:由大型语言模型驱动的智能体已展现出在自主开展实验和促进多学科科学发现方面的巨大潜力。尽管其能力前景广阔,但也引入了需要审慎考量安全性的新型脆弱性。然而,现有文献存在显著空白——尚未对这些脆弱性进行系统性探索。本立场论文通过全面审视基于大语言模型的科学领域智能体的脆弱性,揭示其滥用相关的潜在风险,并强调安全措施的必要性。我们首先系统概述科学大语言模型智能体固有的潜在风险,涵盖用户意图、特定科学领域及其对外部环境的潜在影响。随后深入探究这些脆弱性的根源,并对当前有限的相关研究进行范围综述。基于分析,我们提出包含人类监管、智能体对齐和环境反馈理解(智能体监管)的三元框架来缓解已识别的风险。最后,我们强调保障科学智能体安全所面临的局限与挑战,并倡导开发改进模型、建立稳健基准及制定全面法规以有效应对这些问题。