Spellchecking is one of the most fundamental and widely used search features. Correcting incorrectly spelled user queries not only enhances the user experience but is expected by the user. However, most widely available spellchecking solutions are either lower accuracy than state-of-the-art solutions or too slow to be used for search use cases where latency is a key requirement. Furthermore, most innovative recent architectures focus on English and are not trained in a multilingual fashion and are trained for spell correction in longer text, which is a different paradigm from spell correction for user queries, where context is sparse (most queries are 1-2 words long). Finally, since most enterprises have unique vocabularies such as product names, off-the-shelf spelling solutions fall short of users' needs. In this work, we build a multilingual spellchecker that is extremely fast and scalable and that adapts its vocabulary and hence speller output based on a specific product's needs. Furthermore, our speller out-performs general purpose spellers by a wide margin on in-domain datasets. Our multilingual speller is used in search in Adobe products, powering autocomplete in various applications.
翻译:拼写检查是最基础且广泛使用的搜索功能之一。纠正用户拼写错误的查询不仅能提升用户体验,更是用户的期望所在。然而,目前大多数通用的拼写检查解决方案要么准确率低于现有最优方案,要么因速度过慢而无法满足对延迟有严格要求的搜索场景。此外,多数创新性近期架构主要聚焦于英文领域,未采用多语言训练方式,且针对长文本拼写校正进行训练——这与面向用户查询的拼写校正范式截然不同,后者上下文极其稀疏(大多数查询仅含1-2个单词)。最后,由于大多数企业拥有包含产品名称等独特词汇表,现成的拼写解决方案往往难以满足用户实际需求。本研究构建了一个兼具极快速度与可扩展性的多语言拼写检查器,能根据特定产品需求自适应调整词汇表及拼写输出。在领域内数据集上,我们的拼写器性能大幅超越通用拼写方案。该多语言拼写器已部署于Adobe系列产品的搜索功能中,为各类应用提供自动补全能力。