This paper addresses a cross-modal learning framework, where the objective is to enhance the performance of supervised learning in the primary modality using an unlabeled, unpaired secondary modality. Taking a probabilistic approach for missing information estimation, we show that the extra information contained in the secondary modality can be estimated via Nadaraya-Watson (NW) kernel regression, which can further be expressed as a kernelized cross-attention module (under linear transformation). This expression lays the foundation for introducing The Attention Patch (TAP), a simple neural network add-on that can be trained to allow data-level knowledge transfer from the unlabeled modality. We provide extensive numerical simulations using real-world datasets to show that TAP can provide statistically significant improvement in generalization across different domains and different neural network architectures, making use of seemingly unusable unlabeled cross-modal data.
翻译:本文提出一种跨模态学习框架,其目标是通过使用未标记、未配对的辅助模态来提升主模态中监督学习的性能。采用概率方法进行缺失信息估计,我们证明辅助模态中包含的额外信息可通过Nadaraya-Watson(NW)核回归进行估计,该估计可进一步表示为核化交叉注意力模块(在线性变换下)。这一表达式为引入注意力补丁(TAP)奠定了理论基础——TAP是一种简单的神经网络附加模块,可通过训练实现从未标记模态进行数据级知识迁移。我们利用真实世界数据集进行了大量数值模拟,结果表明TAP能够显著提升不同领域和不同神经网络架构的泛化性能,从而有效利用看似无法使用的未标记跨模态数据。