Progress in conversational information access (CIA) systems has been hindered by the difficulty of evaluating such systems with reproducible experiments. While user simulation offers a promising solution, the lack of infrastructure and tooling to support this evaluation paradigm remains a significant barrier. To address this gap, we introduce SimLab, the first cloud-based platform providing a centralized solution for the community to benchmark both conversational systems and user simulators in a controlled and reproducible setting. We articulate the requirements for such a platform and propose a general infrastructure to meet them. We then present the design and implementation of an initial version of SimLab and showcase its features through an initial simulation-based evaluation task in conversational movie recommendation. Furthermore, we discuss the platform's sustainability and future opportunities for development, inviting the community to drive further progress in the fields of CIA and user simulation.
翻译:对话式信息访问系统的进展一直受限于难以通过可复现实验对此类系统进行评估。虽然用户仿真提供了一种有前景的解决方案,但支持该评估范式的基础设施和工具链的缺乏仍然是一个重大障碍。为弥补这一缺口,我们推出了SimLab,这是首个基于云端的平台,为研究社区提供了一个集中化解决方案,用于在受控且可复现的环境中,对对话式系统和用户仿真器进行基准测试。我们阐述了此类平台的需求,并提出了满足这些需求的通用基础设施。接着,我们介绍了SimLab初始版本的设计与实现,并通过对话式电影推荐中的一个初步基于仿真的评估任务来展示其功能。此外,我们讨论了该平台的可持续性及未来的发展机遇,并邀请研究社区共同推动对话式信息访问和用户仿真领域的进一步发展。