User simulators are essential for evaluating search systems, but they primarily copy user actions without understanding the underlying thought process. This gap exists since large-scale interaction logs record what users do, but not what they might be thinking or feeling, such as confusion or satisfaction. To solve this problem, we present a framework to infer cognitive traces from behavior logs. Our method uses a multi-agent system grounded in Information Foraging Theory (IFT) and human expert judgment. These traces improve model performance on tasks like forecasting session outcomes and user struggle recovery. We release a collection of annotations for several public datasets, including AOL and Stack Overflow, and an open-source tool that allows researchers to apply our method to their own data. This work provides the tools and data needed to build more human-like user simulators and to assess retrieval systems on user-oriented dimensions of performance.
翻译:用户模拟器对于评估搜索系统至关重要,但它们主要复制用户行为,而不理解其背后的思维过程。这一差距之所以存在,是因为大规模的交互日志记录了用户的所作所为,但并未记录他们可能在想什么或感受什么,例如困惑或满意。为解决此问题,我们提出了一个从行为日志推断认知轨迹的框架。我们的方法使用一个基于信息觅食理论(IFT)和人类专家判断的多智能体系统。这些轨迹提升了模型在预测会话结果和用户困境恢复等任务上的性能。我们发布了针对多个公共数据集(包括AOL和Stack Overflow)的标注集合,以及一个开源工具,允许研究人员将我们的方法应用于他们自己的数据。这项工作为构建更类人的用户模拟器以及从面向用户的性能维度评估检索系统提供了所需的工具和数据。