Qualitative insights from user experiences are critical for informing product and policy decisions, but collecting such data at scale is constrained by the time and availability of experts to conduct semi-structured interviews. Recent work has explored using large language models (LLMs) to automate interviewing, yet existing systems lack a principled mechanism for balancing systematic coverage of predefined topics with adaptive exploration, or the ability to pursue follow-ups, deep dives, and emergent themes that arise organically during conversation. In this work, we formulate adaptive semi-structured interviewing as an optimization problem over the interviewer's behavior. We define interview utility as a trade-off between coverage of a predefined interview topic guide, discovery of relevant emergent themes, and interview cost measured by length. Based on this formulation, we introduce SparkMe, a multi-agent LLM interviewer that performs deliberative planning via simulated conversation rollouts to select questions with high expected utility. We evaluate SparkMe through controlled experiments with LLM-based interviewees, showing that it achieves higher interview utility, improving topic guide coverage (+4.7% over the best baseline) and eliciting richer emergent insights while using fewer conversational turns than prior LLM interviewing approaches. We further validate SparkMe in a user study with 70 participants across 7 professions on the impact of AI on their workflows. Domain experts rate SparkMe as producing high-quality adaptive interviews that surface helpful profession-specific insights not captured by prior approaches. The code, datasets, and evaluation protocols for SparkMe are available as open-source at https://github.com/SALT-NLP/SparkMe.
翻译:从用户体验中获取定性洞察对于指导产品和政策决策至关重要,但大规模收集此类数据受限于专家进行半结构化访谈所需的时间和可用性。近期研究探索了使用大语言模型(LLM)实现访谈自动化,然而现有系统缺乏平衡预定义主题系统性覆盖与自适应探索的原则性机制,也缺乏在对话过程中自然跟进、深入挖掘及追踪涌现主题的能力。本研究将自适应半结构化访谈建模为访谈者行为的优化问题,将访谈效用定义为预定义访谈主题指南的覆盖率、相关涌现主题的发现度以及以对话轮次衡量的访谈成本三者之间的权衡。基于此框架,我们提出SparkMe——一个多智能体LLM访谈系统,通过模拟对话推演进行审慎规划,以选择具有高期望效用的问题。我们通过基于LLM的模拟受访者对照实验评估SparkMe,结果表明其实现了更高的访谈效用:相比现有最佳基线方法,主题指南覆盖率提升4.7%,同时能以更少对话轮次激发更丰富的涌现洞察。我们进一步通过涉及7个职业领域70名参与者的用户研究验证SparkMe,探究AI对其工作流程的影响。领域专家评价SparkMe能生成高质量的自适应访谈,揭示出先前方法未能捕捉的、具有职业特异性的实用洞察。SparkMe的代码、数据集及评估协议已在https://github.com/SALT-NLP/SparkMe开源发布。