Towards Better Query Classification with Multi-Expert Knowledge Condensation in JD Ads Search

Search query classification, as an effective way to understand user intents, is of great importance in real-world online ads systems. To ensure a lower latency, a shallow model (e.g. FastText) is widely used for efficient online inference. However, the representation ability of the FastText model is insufficient, resulting in poor classification performance, especially on some low-frequency queries and tailed categories. Using a deeper and more complex model (e.g. BERT) is an effective solution, but it will cause a higher online inference latency and more expensive computing costs. Thus, how to juggle both inference efficiency and classification performance is obviously of great practical importance. To overcome this challenge, in this paper, we propose knowledge condensation (KC), a simple yet effective knowledge distillation framework to boost the classification performance of the online FastText model under strict low latency constraints. Specifically, we propose to train an offline BERT model to retrieve more potentially relevant data. Benefiting from its powerful semantic representation, more relevant labels not exposed in the historical data will be added into the training set for better FastText model training. Moreover, a novel distribution-diverse multi-expert learning strategy is proposed to further improve the mining ability of relevant data. By training multiple BERT models from different data distributions, it can respectively perform better at high, middle, and low-frequency search queries. The model ensemble from multi-distribution makes its retrieval ability more powerful. We have deployed two versions of this framework in JD search, and both offline experiments and online A/B testing from multiple datasets have validated the effectiveness of the proposed approach.

翻译：搜索查询分类作为理解用户意图的有效手段，在真实在线广告系统中具有重要价值。为确保低延迟，广泛采用浅层模型（如FastText）进行高效在线推理。然而，FastText模型的表示能力不足，导致分类性能欠佳，尤其是在低频查询与长尾类别上。采用更深层、更复杂的模型（如BERT）是有效解决方案，但会增加在线推理延迟和计算成本。因此，如何兼顾推理效率与分类性能显然具有重要实践意义。为攻克这一难题，本文提出知识凝练（Knowledge Condensation, KC）——一种简单而有效的知识蒸馏框架，旨在严格低延迟约束下提升在线FastText模型的分类性能。具体而言，我们提出训练离线BERT模型以检索更多潜在相关数据。凭借其强大的语义表示能力，历史数据中未暴露的更多相关标签将被添加入训练集，从而优化FastText模型训练。此外，我们提出一种新颖的分布多样多专家学习策略，以进一步增强相关数据的挖掘能力。通过基于不同数据分布训练多个BERT模型，该策略可分别在高频、中频与低频搜索查询上表现更优。多分布模型集成增强了检索能力。我们已在京东搜索中部署了两个版本的该框架，多数据集的离线实验与在线A/B测试均验证了所提方法的有效性。