Robotics-Inspired Guardrails for Foundation Models in Socially Sensitive Domains

Foundation models are increasingly deployed in socially sensitive domains such as education, mental health, and caregiving, where failures are often cumulative and context-dependent. Existing guardrail approaches -- ranging from training-time alignment to prompting, decoding constraints, and post-hoc moderation -- primarily provide empirical risk reduction rather than enforceable behavioral guarantees, and largely treat safety as a property of individual outputs rather than interaction trajectories. We reframe guardrails as a problem of runtime behavioral control over interaction trajectories, drawing on robotics to introduce formal constructs for constraint enforcement in uncertain, closed-loop systems. We instantiate these ideas in the Grounded Observer framework and apply it across three real-world deployments: small talk, in-home autism therapy, and behavioral de-escalation in schools. Across settings, the framework enables runtime interventions that mitigate drift into undesirable interaction regimes while adapting to diverse social contexts. We discuss extensions to the framework and propose research directions toward stronger guarantees.

翻译：基础模型正日益部署于教育、心理健康和照护等社会敏感领域，在这些领域中，失败往往是累积性的且依赖具体情境。现有的护栏方法——从训练时对齐到提示工程、解码约束和后处理审核——主要提供经验性的风险降低而非可执行的行为保障，并且大多将安全性视为单个输出的属性而非交互轨迹的属性。本文将护栏重新定义为对交互轨迹的运行时行为控制问题，借鉴机器人学引入正式构造，以约束不确定的闭环系统中的行为。我们将这些思想实例化为“具身观察者”框架，并将其应用于三个实际部署场景：闲聊、家庭自闭症治疗以及学校中的行为降级干预。在不同场景中，该框架能够在运行时进行干预，减缓向不良交互模式的漂移，同时适应多样化的社会情境。我们讨论了框架的扩展，并提出了迈向更强保障的研究方向。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

机器人领域中的视觉-语言-动作模型：数据集、基准测试与数据引擎综述

专知会员服务

13+阅读 · 4月29日

《人工智能增强监视分析：利用跨网络、陆地、空中及海上领域的威胁向量实时建模》

专知会员服务

29+阅读 · 2025年12月11日

《利用人工智能增强的监视分析在网络、陆地、空中和海上领域实时建模威胁向量》

专知会员服务

24+阅读 · 2025年11月2日

《防御行动中人机编队的情感认知负荷管理》

专知会员服务

19+阅读 · 2025年11月2日