In this paper, we propose a robust multilingual model to improve the quality of search results. Our model not only leverage the processed class-balanced dataset, but also benefit from multitask pre-training that leads to more general representations. In pre-training stage, we adopt mlm task, classification task and contrastive learning task to achieve considerably performance. In fine-tuning stage, we use confident learning, exponential moving average method (EMA), adversarial training (FGM) and regularized dropout strategy (R-Drop) to improve the model's generalization and robustness. Moreover, we use a multi-granular semantic unit to discover the queries and products textual metadata for enhancing the representation of the model. Our approach obtained competitive results and ranked top-8 in three tasks. We release the source code and pre-trained models associated with this work.
翻译:本文提出了一种鲁棒的多语言模型,用于提升搜索结果的质量。该模型不仅利用了经处理的类平衡数据集,还受益于多任务预训练,从而获得更通用的表征。在预训练阶段,我们采用掩码语言模型任务、分类任务和对比学习任务以实现显著性能提升。在微调阶段,我们使用置信学习、指数移动平均法、对抗训练和正则化丢弃策略来增强模型的泛化能力和鲁棒性。此外,我们利用多粒度语义单元挖掘查询与产品的文本元数据,以强化模型表征。我们的方法取得了具有竞争力的结果,并在三项任务中位列前八。我们公开了本工作相关的源代码与预训练模型。