This technical report briefly describes our JDExplore d-team's submission Vega v1 on the General Language Understanding Evaluation (GLUE) leaderboard, where GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference. [Method] We investigate several effective strategies and choose their best combination setting as the training recipes. As for model structure, we employ the vanilla Transformer with disentangled attention as the basic block encoder. For self-supervised training, we employ the representative denoising objective (i.e., replaced token detection) in phase 1 and combine the contrastive objective (i.e., sentence embedding contrastive learning) with it in phase 2. During fine-tuning, several advanced techniques such as transductive fine-tuning, self-calibrated fine-tuning, and adversarial fine-tuning are adopted. [Results] According to our submission record (Jan. 2022), with our optimized pretraining and fine-tuning strategies, our 1.3 billion model sets new state-of-the-art on 4/9 tasks, achieving the best average score of 91.3. Encouragingly, our Vega v1 is the first to exceed powerful human performance on the two challenging tasks, i.e., SST-2 and WNLI. We believe our empirically successful recipe with a bag of tricks could shed new light on developing efficient discriminative large language models.
翻译:本技术报告简要描述了我们的JDExplore d团队在通用语言理解评估(GLUE)排行榜上的提交成果Vega v1。GLUE包含九项自然语言理解任务,涵盖问答、语言可接受性、情感分析、文本相似度、释义检测及自然语言推理。[方法]我们探索了多种有效策略,并选取其最佳组合配置作为训练方案。模型结构方面,采用基于解耦注意力的标准Transformer作为基本块编码器。自监督训练中,第一阶段使用代表性去噪目标(即替代词检测),第二阶段将其与对比学习目标(即句子嵌入对比学习)结合。微调阶段采用了直推式微调、自校准微调及对抗性微调等先进技术。[结果]根据我们2022年1月的提交记录,通过优化预训练与微调策略,我们的13亿参数模型在4/9任务上创下新最优结果,平均得分达91.3。值得注意的是,Vega v1首次在SST-2和WNLI两项具有挑战性的任务上超越强大的人类表现。我们相信,这套经验验证有效的技巧集锦将为开发高效判别式大语言模型提供新思路。