Text embeddings are essential components in modern NLP pipelines. Although numerous embedding models have been proposed, no single model consistently dominates across domains and tasks. This variability motivates the use of ensemble techniques to combine complementary strengths. However, most existing ensemble methods operate on deterministic embeddings and fail to account for model-specific uncertainty, limiting their robustness and reliability in downstream applications. To address these limitations, we propose Uncertainty-driven Embedding Convolution (UEC). UEC first transforms deterministic embeddings into probabilistic ones in a post-hoc manner. It then computes adaptive ensemble coefficients based on embedding uncertainty, derived from a principled surrogate-loss formulation. Additionally, UEC employs an uncertainty-aware similarity function that directly incorporates uncertainty into the similarity scoring, providing a theoretically grounded and efficient surrogate to distributional distances. Extensive experiments on diverse benchmarks demonstrate that UEC consistently improves both performance and robustness by leveraging principled uncertainty modeling.
翻译:文本嵌入是现代自然语言处理流程中的核心组件。尽管已有众多嵌入模型被提出,但尚无单一模型能够在所有领域和任务中始终保持最优性能。这种性能差异促使研究者采用集成技术来结合不同模型的互补优势。然而,现有的大多数集成方法仅处理确定性嵌入,未能考虑模型特定的不确定性,这限制了其在下游应用中的鲁棒性和可靠性。为克服这些局限性,本文提出不确定性驱动的嵌入卷积方法。该方法首先以后处理方式将确定性嵌入转换为概率化嵌入,随后基于嵌入不确定性(通过理论替代损失公式推导得出)计算自适应集成系数。此外,UEC采用不确定性感知的相似度函数,将不确定性直接纳入相似性评分过程,为分布距离提供了理论完备且计算高效的替代方案。在多样化基准测试上的大量实验表明,通过采用理论驱动的不确定性建模,UEC能够持续提升模型性能与鲁棒性。