Joint Embedding Predictive Architectures (JEPA) are a novel self supervised training technique that have shown recent promise across domains. We introduce BERT-JEPA (BEPA), a training paradigm that adds a JEPA training objective to BERT-style models, working to combat a collapsed [CLS] embedding space and turning it into a language-agnostic space. This new structure leads to increased performance across multilingual benchmarks.
翻译:联合嵌入预测架构(JEPA)是一种新颖的自监督训练技术,近期在各个领域展现出潜力。我们提出BERT-JEPA(BEPA),这是一种在BERT类模型中添加JEPA训练目标的新范式,旨在解决[CLS]嵌入空间坍缩问题,并将其转化为语言无关的语义空间。这一新结构在多语言基准测试中实现了性能提升。