GARCIA: Powering Representations of Long-tail Query with Multi-granularity Contrastive Learning

Recently, the growth of service platforms brings great convenience to both users and merchants, where the service search engine plays a vital role in improving the user experience by quickly obtaining desirable results via textual queries. Unfortunately, users' uncontrollable search customs usually bring vast amounts of long-tail queries, which severely threaten the capability of search models. Inspired by recently emerging graph neural networks (GNNs) and contrastive learning (CL), several efforts have been made in alleviating the long-tail issue and achieve considerable performance. Nevertheless, they still face a few major weaknesses. Most importantly, they do not explicitly utilize the contextual structure between heads and tails for effective knowledge transfer, and intention-level information is commonly ignored for more generalized representations. To this end, we develop a novel framework GARCIA, which exploits the graph based knowledge transfer and intention based representation generalization in a contrastive setting. In particular, we employ an adaptive encoder to produce informative representations for queries and services, as well as hierarchical structure aware representations of intentions. To fully understand tail queries and services, we equip GARCIA with a novel multi-granularity contrastive learning module, which powers representations through knowledge transfer, structure enhancement and intention generalization. Subsequently, the complete GARCIA is well trained in a pre-training&fine-tuning manner. At last, we conduct extensive experiments on both offline and online environments, which demonstrates the superior capability of GARCIA in improving tail queries and overall performance in service search scenarios.

翻译：近期，服务平台的发展为商家和用户带来了极大便利，其中服务搜索引擎通过快速获取文本查询结果在提升用户体验中扮演关键角色。然而，用户不可控的搜索习惯通常导致大量长尾查询，这严重威胁搜索模型的能力。受新兴图神经网络(GNNs)和对比学习(CL)的启发，已有研究在缓解长尾问题方面取得了显著成效。但这些方法仍存在若干主要缺陷：首先，它们未能显式利用头部与尾部之间的上下文结构进行有效知识迁移；其次，意图层面信息通常被忽视，难以生成更通用的表征。为此，我们提出新型框架GARCIA，在对比学习框架中融合基于图的知识迁移与基于意图的表征泛化。具体而言，我们采用自适应编码器生成查询与服务的表征，同时获取包含层次结构信息的意图表征。为充分理解长尾查询与服务，我们在GARCIA中创新性地引入多粒度对比学习模块，通过知识迁移、结构增强和意图泛化增强表征能力。随后，完整GARCIA框架采用预训练-微调范式进行训练。最终，我们在离线和在线环境中开展大量实验，验证了GARCIA在服务搜索场景中提升长尾查询与整体性能的卓越能力。