Spellchecking is one of the most fundamental and widely used search features. Correcting incorrectly spelled user queries not only enhances the user experience but is expected by the user. However, most widely available spellchecking solutions are either lower accuracy than state-of-the-art solutions or too slow to be used for search use cases where latency is a key requirement. Furthermore, most innovative recent architectures focus on English and are not trained in a multilingual fashion and are trained for spell correction in longer text, which is a different paradigm from spell correction for user queries, where context is sparse (most queries are 1-2 words long). Finally, since most enterprises have unique vocabularies such as product names, off-the-shelf spelling solutions fall short of users' needs. In this work, we build a multilingual spellchecker that is extremely fast and scalable and that adapts its vocabulary and hence speller output based on a specific product's needs. Furthermore, our speller out-performs general purpose spellers by a wide margin on in-domain datasets. Our multilingual speller is used in search in Adobe products, powering autocomplete in various applications.
翻译:拼写检查是最基础且广泛使用的搜索功能之一。纠正用户查询中的拼写错误不仅能提升用户体验,更是用户的预期需求。然而,大多数广泛可用的拼写检查解决方案要么准确率低于前沿技术,要么因速度过慢而无法应用于对延迟有严格要求的搜索场景。此外,近年来创新的架构大多聚焦于英语,既缺乏多语言训练,又针对长文本的拼写纠错进行训练——这与用户查询(大部分查询仅含1-2个词,上下文稀疏)的拼写纠错范式截然不同。最后,由于大多数企业拥有特定词汇(如产品名称),通用的现成拼写解决方案难以满足用户需求。本文构建了一个兼具极快速度与可扩展性的多语言拼写检查器,可根据具体产品的需求动态调整词汇表及拼写输出。实验表明,我们的拼写器在领域内数据集上的表现远超通用拼写器。该多语言拼写器已应用于Adobe产品的搜索功能,为多种应用中的自动补全提供支持。