As emotional support chatbots have recently gained significant traction across both research and industry, a common evaluation strategy has emerged: use help-seeker simulators to interact with supporter chatbots. However, current simulators suffer from two critical limitations: (1) they fail to capture the behavioral diversity of real-world seekers, often portraying them as overly cooperative, and (2) they lack the controllability required to simulate specific seeker profiles. To address these challenges, we present a controllable seeker simulator driven by nine psychological and linguistic features that underpin seeker behavior. Using authentic Reddit conversations, we train our model via a Mixture-of-Experts (MoE) architecture, which effectively differentiates diverse seeker behaviors into specialized parameter subspaces, thereby enhancing fine-grained controllability. Our simulator achieves superior profile adherence and behavioral diversity compared to existing approaches. Furthermore, evaluating 7 prominent supporter models with our system uncovers previously obscured performance degradations. These findings underscore the utility of our framework in providing a more faithful and stress-tested evaluation for emotional support chatbots.
翻译:随着情感支持聊天机器人在研究与产业界日益受到重视,一种常见的评估策略逐渐形成:使用求助者模拟器与支持型聊天机器人进行交互。然而,现有的模拟器存在两个关键局限:(1)它们未能捕捉真实世界中求助者的行为多样性,往往将其描绘得过于配合;(2)缺乏模拟特定求助者画像所需的可控性。为应对这些挑战,我们提出了一种可控的求助者模拟器,其驱动基于九项支撑求助者行为的心理与语言特征。利用真实的Reddit对话数据,我们通过混合专家(MoE)架构训练模型,该架构能有效将多样化的求助者行为区分至专门的参数子空间,从而提升细粒度可控性。与现有方法相比,我们的模拟器在画像贴合度与行为多样性方面均表现更优。此外,使用本系统对7个主流支持模型进行评估,揭示了先前被掩盖的性能下降问题。这些发现凸显了本框架在为情感支持聊天机器人提供更忠实、更经得起压力测试的评估方面的实用价值。