Large language model chatbots are increasingly deployed in organizational settings such as healthcare, finance, and public services. Evaluating policy alignment is therefore critical to reliable chatbot deployment. By analyzing real-world user queries, we identify composed-policy violation is prevalent in various chatbots but overlooked by existing benchmarks. This paper present COPAL, an automated tool for evaluating composed-policy alignment in chatbots. COPAL efficiently generates queries that trigger composed-policy failures in chatbots via empirically derived interaction patterns and explicit handling contracts. Queries generated by COPAL expose substantial query handling failures: across 9 served models, composed-policy queries yield a 33.1% error rate on average, indicating that composed-policy alignment warrants further investigation.
翻译:大语言模型聊天机器人越来越多地部署在医疗、金融和公共服务等组织环境中。因此,评估策略对齐性对于可靠部署聊天机器人至关重要。通过分析真实世界用户查询,我们发现复合策略违规在各种聊天机器人中普遍存在,但现有基准测试却忽略了这一问题。本文介绍了COPAL,一种自动评估聊天机器人中复合策略对齐性的工具。COPAL通过经验推导出的交互模式和显式处理契约,高效生成能触发聊天机器人复合策略失效的查询。由COPAL生成的查询暴露了显著的查询处理失败:在9个服务模型中,复合策略查询的平均错误率达到33.1%,表明复合策略对齐性需要进一步研究。