Intent identification serves as the foundation for generating appropriate responses in personalized question answering (PQA). However, existing benchmarks evaluate only response quality or retrieval performance without directly measuring intent identification capabilities. This gap is critical because without understanding which intents users prioritize, systems cannot generate responses satisfying individual information needs. To address this, we introduce the concept of core intents: intents users prioritize when selecting answers to satisfy their information needs. To evaluate these core intents, we propose IPQA, a benchmark for core Intent identification in Personalized Question Answering. Since users do not explicitly state their prioritized intents, we derive core intents from observable behavior patterns in answer selection, grounded in satisficing theory where users choose answers meeting their acceptance thresholds. We construct a dataset with various domains through systematic filtering, LLM-based annotation, and rigorous quality control combining automated verification with human validation. Experimental evaluations across state-of-the-art language models reveal that current systems struggle with core intent identification in personalized contexts. Models fail to identify core intents from user histories, with performance degrading as question complexity increases. The code and dataset will be made publicly available to facilitate future research in this direction.
翻译:意图识别是个性化问答(PQA)中生成恰当响应的基础。然而,现有基准仅评估响应质量或检索性能,并未直接衡量意图识别能力。这一缺失至关重要,因为若无法理解用户优先考虑哪些意图,系统便无法生成满足个体信息需求的响应。为解决此问题,我们提出了核心意图的概念:即用户为满足其信息需求而选择答案时优先考虑的意图。为评估这些核心意图,我们提出了IPQA——一个用于个性化问答中核心意图识别的基准。由于用户不会明确陈述其优先意图,我们依据满意理论,从用户选择答案时可观察的行为模式中推导出核心意图,该理论认为用户会选择满足其接受阈值的答案。我们通过系统化筛选、基于大语言模型的标注以及结合自动验证与人工校验的严格质量控制,构建了一个涵盖多领域的数据集。对当前先进语言模型的实验评估表明,现有系统在个性化场景下的核心意图识别方面存在困难。模型无法从用户历史中识别核心意图,且随着问题复杂性增加,其性能显著下降。代码与数据集将公开提供,以促进该方向的未来研究。