Artificial intelligence (AI) agents are increasingly used in a variety of domains to automate tasks, interact with users, and make decisions based on data inputs. Ensuring that AI agents perform only authorized actions and handle inputs appropriately is essential for maintaining system integrity and preventing misuse. In this study, we introduce the AgentGuardian, a novel security framework that governs and protects AI agent operations by enforcing context-aware access-control policies. During a controlled staging phase, the framework monitors execution traces to learn legitimate agent behaviors and input patterns. From this phase, it derives adaptive policies that regulate tool calls made by the agent, guided by both real-time input context and the control flow dependencies of multi-step agent actions. Evaluation across two real-world AI agent applications demonstrates that AgentGuardian effectively detects malicious or misleading inputs while preserving normal agent functionality. Moreover, its control-flow-based governance mechanism mitigates hallucination-driven errors and other orchestration-level malfunctions.
翻译:人工智能(AI)智能体正日益广泛地应用于各领域,以实现任务自动化、与用户交互以及基于数据输入进行决策。确保AI智能体仅执行授权操作并妥善处理输入,对于维护系统完整性和防止滥用至关重要。本研究提出AgentGuardian,这是一种新颖的安全框架,通过执行上下文感知的访问控制策略来管理和保护AI智能体操作。在受控的预演阶段,该框架通过监控执行轨迹来学习合法的智能体行为与输入模式。基于此阶段,它推导出自适应策略,这些策略在实时输入上下文与多步智能体动作的控制流依赖关系的共同指导下,对智能体发起的工具调用进行约束。在两个真实世界AI智能体应用中的评估表明,AgentGuardian能有效检测恶意或误导性输入,同时保持正常的智能体功能。此外,其基于控制流的治理机制缓解了由幻觉驱动的错误及其他编排层面的功能异常。