We propose the Multi-Head Gaussian Adaptive Attention Mechanism (GAAM), a novel probabilistic attention framework, and the Gaussian Adaptive Transformer (GAT), designed to enhance information aggregation across multiple modalities, including Speech, Text and Vision. GAAM integrates learnable mean and variance into its attention mechanism, implemented in a Multi-Headed framework enabling it to collectively model any Probability Distribution for dynamic recalibration of feature significance. This method demonstrates significant improvements, especially with highly non-stationary data, surpassing the state-of-the-art attention techniques in model performance (up to approximately +20% in accuracy) by identifying key elements within the feature space. GAAM's compatibility with dot-product-based attention models and relatively low number of parameters showcases its adaptability and potential to boost existing attention frameworks. Empirically, GAAM exhibits superior adaptability and efficacy across a diverse range of tasks, including emotion recognition in speech, image classification, and text classification, thereby establishing its robustness and versatility in handling multi-modal data. Furthermore, we introduce the Importance Factor (IF), a new learning-based metric that enhances the explainability of models trained with GAAM-based methods. Overall, GAAM represents an advancement towards development of better performing and more explainable attention models across multiple modalities.
翻译:我们提出多头高斯自适应注意力机制(GAAM)——一种新型概率注意力框架,以及高斯自适应变换器(GAT),旨在增强语音、文本和视觉等多模态信息聚合能力。GAAM将可学习的均值与方差融入注意力机制,通过多头架构实现任意概率分布的联合建模,从而动态重新校准特征重要性。该方法在处理高度非平稳数据时展现出显著优势,通过识别特征空间中的关键要素,在模型性能上超越当前最先进的注意力技术(准确率提升最高可达约20%)。GAAM与基于点积的注意力模型兼容,且参数量相对较低,展现了其增强现有注意力框架的适应性与潜力。实验表明,GAAM在语音情感识别、图像分类和文本分类等多样化任务中均表现出卓越的适应性与有效性,充分验证了其处理多模态数据的鲁棒性和通用性。此外,我们引入重要性因子(IF)这一新型学习度量指标,有效提升了基于GAAM方法训练模型的可解释性。总体而言,GAAM推动了跨多模态高性能、高可解释性注意力模型的发展。