In light of the success of the pre-trained language models (PLMs), continual pre-training of generic PLMs has been the paradigm of domain adaption. In this paper, we propose QUERT, A Continual Pre-trained Language Model for QUERy Understanding in Travel Domain Search. QUERT is jointly trained on four tailored pre-training tasks to the characteristics of query in travel domain search: Geography-aware Mask Prediction, Geohash Code Prediction, User Click Behavior Learning, and Phrase and Token Order Prediction. Performance improvement of downstream tasks and ablation experiment demonstrate the effectiveness of our proposed pre-training tasks. To be specific, the average performance of downstream tasks increases by 2.02% and 30.93% in supervised and unsupervised settings, respectively. To check on the improvement of QUERT to online business, we deploy QUERT and perform A/B testing on Fliggy APP. The feedback results show that QUERT increases the Unique Click-Through Rate and Page Click-Through Rate by 0.89% and 1.03% when applying QUERT as the encoder. Our code and downstream task data will be released for future research.
翻译:鉴于预训练语言模型(PLMs)的成功,通用PLMs的持续预训练已成为领域适应的标准范式。本文提出QUERT,一种面向旅行领域搜索查询理解的持续预训练语言模型。QUERT通过四项针对旅行领域搜索查询特征的定制预训练任务进行联合训练:地理感知掩码预测、地理哈希码预测、用户点击行为学习以及短语与词元顺序预测。下游任务性能提升与消融实验验证了我们提出的预训练任务的有效性。具体而言,在有监督和无监督设置下,下游任务的平均性能分别提升了2.02%和30.93%。为验证QUERT对在线业务的实际改进效果,我们在飞猪APP上部署QUERT并开展A/B测试。反馈结果显示,当使用QUERT作为编码器时,独立点击率和页面点击率分别提升了0.89%和1.03%。我们的代码及下游任务数据将公开以供后续研究。