Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies

Chirag Shah,Ryen W. White,Reid Andersen,Georg Buscher,Scott Counts,Sarkar Snigdha Sarathi Das,Ali Montazer,Sathish Manivannan,Jennifer Neville,Xiaochuan Ni,Nagu Rangan,Tara Safavi,Siddharth Suri,Mengting Wan,Leijie Wang,Longqi Yang

Log data can reveal valuable information about how users interact with Web search services, what they want, and how satisfied they are. However, analyzing user intents in log data is not easy, especially for emerging forms of Web search such as AI-driven chat. To understand user intents from log data, we need a way to label them with meaningful categories that capture their diversity and dynamics. Existing methods rely on manual or machine-learned labeling, which are either expensive or inflexible for large and dynamic datasets. We propose a novel solution using large language models (LLMs), which can generate rich and relevant concepts, descriptions, and examples for user intents. However, using LLMs to generate a user intent taxonomy and apply it for log analysis can be problematic for two main reasons: (1) such a taxonomy is not externally validated; and (2) there may be an undesirable feedback loop. To address this, we propose a new methodology with human experts and assessors to verify the quality of the LLM-generated taxonomy. We also present an end-to-end pipeline that uses an LLM with human-in-the-loop to produce, refine, and apply labels for user intent analysis in log data. We demonstrate its effectiveness by uncovering new insights into user intents from search and chat logs from the Microsoft Bing commercial search engine. The proposed work's novelty stems from the method for generating purpose-driven user intent taxonomies with strong validation. This method not only helps remove methodological and practical bottlenecks from intent-focused research, but also provides a new framework for generating, validating, and applying other kinds of taxonomies in a scalable and adaptable way with minimal human effort.

翻译：日志数据能够揭示用户如何与网络搜索服务交互、他们的需求以及满意度等宝贵信息。然而，从日志数据中分析用户意图并非易事，尤其是在人工智能驱动聊天等新兴网络搜索形式中。为了从日志数据中理解用户意图，我们需要一种方法，用能够捕捉其多样性和动态性的有意义类别来对其进行标记。现有方法依赖人工或机器学习标记，对于大规模和动态数据集来说，要么成本高昂，要么不够灵活。我们提出了一种使用大语言模型（LLM）的新颖解决方案，该模型能够为用户意图生成丰富且相关的概念、描述和示例。然而，使用LLM生成用户意图分类体系并将其应用于日志分析可能存在问题，主要有两个原因：（1）此类分类体系未经过外部验证；（2）可能存在不良的反馈循环。为解决这些问题，我们提出了一种新方法论，结合人类专家和评估员来验证LLM生成的分类体系质量。我们还提出了一套端到端的流水线，通过人在回路的LLM来生成、优化及应用用户意图分析中的标签。我们通过在微软必应商业搜索引擎的搜索和聊天日志中发现用户意图的新见解，证明了其有效性。本研究工作的新颖性在于提出了生成具有强验证性的目的驱动型用户意图分类体系的方法。该方法不仅有助于消除意图导向研究中的方法论和实践瓶颈，还提供了一种新框架，以可扩展、可适应的方式，通过最小化人力投入来生成、验证和应用其他类型的分类体系。