String matching algorithms in the presence of abbreviations, such as in Stock Keeping Unit (SKU) product catalogs, remains a relatively unexplored topic. In this paper, we present a unified architecture for SKU search that provides both a real-time suggestion system (based on a Trie data structure) as well as a lower latency search system (making use of character level TF-IDF in combination with language model vector embeddings) where users initiate the search process explicitly. We carry out ablation studies that justify designing a complex search system composed of multiple components to address the delicate trade-off between speed and accuracy. Using SKU search in the Dynamics CRM as an example, we show how our system vastly outperforms, in all aspects, the results provided by the default search engine. Finally, we show how SKU descriptions may be enhanced via generative text models (using gpt-3.5-turbo) so that the consumers of the search results may get more context and a generally better experience when presented with the results of their SKU search.
翻译:在存在缩写的情况下,例如在库存量单位(SKU)产品目录中进行字符串匹配,仍然是一个相对未被探索的课题。本文提出了一种统一的SKU搜索架构,该架构既提供实时建议系统(基于Trie数据结构),也提供低延迟搜索系统(结合字符级TF-IDF与语言模型向量嵌入),供用户显式启动搜索过程。我们通过消融研究证明,设计一个由多个组件组成的复杂搜索系统是合理的,以应对速度与准确性之间的微妙权衡。以Dynamics CRM中的SKU搜索为例,我们展示了该系统在各个方面如何大幅优于默认搜索引擎的结果。最后,我们展示了如何通过生成式文本模型(使用gpt-3.5-turbo)增强SKU描述,从而使搜索结果的消费者在查看SKU搜索结果时获得更多上下文以及更佳的体验。