Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, they also introduce novel vulnerabilities that demand careful consideration for safety. However, there exists a notable gap in the literature, as there has been no comprehensive exploration of these vulnerabilities. This position paper fills this gap by conducting a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures. We begin by providing a comprehensive overview of the potential risks inherent to scientific LLM agents, taking into account user intent, the specific scientific domain, and their potential impact on the external environment. Then, we delve into the origins of these vulnerabilities and provide a scoping review of the limited existing works. Based on our analysis, we propose a triadic framework involving human regulation, agent alignment, and an understanding of environmental feedback (agent regulation) to mitigate these identified risks. Furthermore, we highlight the limitations and challenges associated with safeguarding scientific agents and advocate for the development of improved models, robust benchmarks, and comprehensive regulations to address these issues effectively.
翻译:基于大语言模型的智能体已在各个学科中展现出自主进行实验并促进科学发现的巨大潜力。尽管其能力令人期待,但也引入了需要谨慎考虑安全性的新漏洞。然而,目前文献中存在显著空白,尚未对这些漏洞进行系统性探索。本立场论文通过深入检视基于LLM的智能体在科学领域的漏洞填补了这一空白,揭示了其滥用可能带来的风险,并强调了安全措施的必要性。我们首先全面概述科学LLM智能体固有的潜在风险,涵盖用户意图、特定科学领域及其对外部环境的潜在影响。随后深入探究这些漏洞的根源,并对有限的现有研究进行范围综述。基于分析结果,我们提出了一个三元框架,涉及人类监管、智能体对齐以及环境反馈理解,以减轻这些已识别的风险。此外,我们强调了保护科学智能体所面临的局限性与挑战,并倡导开发更优模型、稳健基准及全面法规来有效解决这些问题。