This paper addresses a cross-modal learning framework, where the objective is to enhance the performance of supervised learning in the primary modality using an unlabeled, unpaired secondary modality. Taking a probabilistic approach for missing information estimation, we show that the extra information contained in the secondary modality can be estimated via Nadaraya-Watson (NW) kernel regression, which can further be expressed as a kernelized cross-attention module (under linear transformation). Our results lay the foundations for introducing The Attention Patch (TAP), a simple neural network add-on that allows data-level knowledge transfer from the unlabeled modality. We provide extensive numerical simulations using four real-world datasets to show that TAP can provide statistically significant improvement in generalization across different domains and different neural network architectures, making use of seemingly unusable unlabeled cross-modal data.
翻译:本文提出一种跨模态学习框架,其目标是通过利用无标签且非配对的辅助模态来提升主模态监督学习的性能。我们采用概率方法进行缺失信息估计,证明辅助模态中的额外信息可通过Nadaraya-Watson(NW)核回归进行估计,该回归进一步可表示为(在线性变换下的)核化交叉注意力模块。我们的结果为引入注意力补丁(The Attention Patch, TAP)奠定基础——这是一种简单的神经网络附加模块,能够实现从无标签模态到数据级别的知识迁移。我们利用四个真实数据集进行了大量数值模拟,结果表明TAP可在不同领域和不同神经网络架构下显著提升泛化性能,成功利用了看似不可用的无标签跨模态数据。