Towards Better Query Classification with Multi-Expert Knowledge Condensation in JD Ads Search

Search query classification, as an effective way to understand user intents, is of great importance in real-world online ads systems. To ensure a lower latency, a shallow model (e.g. FastText) is widely used for efficient online inference. However, the representation ability of the FastText model is insufficient, resulting in poor classification performance, especially on some low-frequency queries and tailed categories. Using a deeper and more complex model (e.g. BERT) is an effective solution, but it will cause a higher online inference latency and more expensive computing costs. Thus, how to juggle both inference efficiency and classification performance is obviously of great practical importance. To overcome this challenge, in this paper, we propose knowledge condensation (KC), a simple yet effective knowledge distillation framework to boost the classification performance of the online FastText model under strict low latency constraints. Specifically, we propose to train an offline BERT model to retrieve more potentially relevant data. Benefiting from its powerful semantic representation, more relevant labels not exposed in the historical data will be added into the training set for better FastText model training. Moreover, a novel distribution-diverse multi-expert learning strategy is proposed to further improve the mining ability of relevant data. By training multiple BERT models from different data distributions, it can respectively perform better at high, middle, and low-frequency search queries. The model ensemble from multi-distribution makes its retrieval ability more powerful. We have deployed two versions of this framework in JD search, and both offline experiments and online A/B testing from multiple datasets have validated the effectiveness of the proposed approach.

翻译：搜索查询分类作为理解用户意图的有效手段，在真实在线广告系统中具有重要作用。为确保低延迟，浅层模型（如FastText）被广泛用于高效在线推理。然而，FastText模型的表征能力不足，导致分类性能欠佳，尤其是对低频查询和长尾类别。采用更深层、更复杂的模型（如BERT）是一种有效解决方案，但这会增加在线推理延迟和计算成本。因此，如何兼顾推理效率与分类性能具有重要的实际意义。为应对这一挑战，本文提出知识凝练（Knowledge Condensation, KC）——一种简单而有效的知识蒸馏框架，旨在严格低延迟约束下提升在线FastText模型的分类性能。具体而言，我们提出训练离线BERT模型以检索更多潜在相关数据。借助其强大的语义表征能力，可将历史数据中未暴露的相关标签纳入训练集，从而优化FastText模型训练。此外，我们提出一种新型的分布多样多专家学习策略，以进一步提升相关数据挖掘能力。通过从不同数据分布训练多个BERT模型，该方法可分别在高频、中频和低频搜索查询上取得更优表现。多分布模型集成进一步增强了检索能力。目前，我们已在京东搜索中部署该框架的两个版本，离线实验与多数据集在线A/B测试均验证了所提方法的有效性。