Natural Language Processing models like BERT can provide state-of-the-art word embeddings for downstream NLP tasks. However, these models yet to perform well on Semantic Textual Similarity, and may be too large to be deployed as lightweight edge applications. We seek to apply a suitable contrastive learning method based on the SimCSE paper, to a model architecture adapted from a knowledge distillation based model, DistilBERT, to address these two issues. Our final lightweight model DistilFace achieves an average of 72.1 in Spearman's correlation on STS tasks, a 34.2 percent improvement over BERT base.
翻译:像BERT这样的自然语言处理模型可以为下游NLP任务提供最先进的词嵌入。然而,这些模型在语义文本相似性任务上表现尚不理想,且可能因体积过大而难以部署为轻量级边缘应用。我们尝试基于SimCSE论文中的方法,将一种合适的对比学习技术应用于从知识蒸馏模型DistilBERT改造而来的模型架构,以解决这两个问题。最终得到的轻量级模型DistilFace在STS任务上的斯皮尔曼相关系数平均达到72.1,相较于BERT基准模型提升了34.2%。