Search in e-Commerce is powered at the core by a structured representation of the inventory, often formulated as a category taxonomy. An important capability in e-Commerce with hierarchical taxonomies is to select a set of relevant leaf categories that are semantically aligned with a given user query. In this scope, we address a fundamental problem of search query categorization in real-world e-Commerce taxonomies. A correct categorization of a query not only provides a way to zoom into the correct inventory space, but opens the door to multiple intent understanding capabilities for a query. A practical and accurate solution to this problem has many applications in e-commerce, including constraining retrieved items and improving the relevance of the search results. For this task, we explore a novel Chain-of-Thought (CoT) paradigm that combines simple tree-search with LLM semantic scoring. Assessing its classification performance on human-judged query-category pairs, relevance tests, and LLM-based reference methods, we find that the CoT approach performs better than a benchmark that uses embedding-based query category predictions. We show how the CoT approach can detect problems within a hierarchical taxonomy. Finally, we also propose LLM-based approaches for query-categorization of the same spirit, but which scale better at the range of millions of queries.
翻译:电子商务搜索的核心驱动力在于库存的结构化表示,通常体现为分类体系。在具有层级分类体系的电子商务中,一项关键能力是选择一组与给定用户查询语义对齐的相关叶类别。在此背景下,我们致力于解决现实世界电子商务分类体系中的搜索查询分类这一基础问题。对查询进行正确分类不仅提供了聚焦正确库存空间的方法,还为深入理解查询的多重意图打开了大门。针对此问题的实用且准确的解决方案在电子商务中具有广泛应用,包括约束检索项和提升搜索结果的相关性。针对此任务,我们探索了一种新颖的思维链范式,该范式将简单的树搜索与大型语言模型语义评分相结合。通过评估其在人工标注的查询-类别对、相关性测试以及基于LLM的参考方法上的分类性能,我们发现思维链方法的表现优于使用基于嵌入的查询类别预测的基准方法。我们展示了思维链方法如何检测层级分类体系内部的问题。最后,我们还提出了基于相同理念但能更好地扩展到百万级查询规模的LLM查询分类方法。