We propose the Multi-Head Density Adaptive Attention Mechanism (DAAM), a novel probabilistic attention framework that can be used for Parameter-Efficient Fine-tuning (PEFT), and the Density Adaptive Transformer (DAT), designed to enhance information aggregation across multiple modalities, including Speech, Text, and Vision. DAAM integrates learnable mean and variance into its attention mechanism, implemented in a multi-head framework, enabling it to collectively model any probability distribution for dynamic recalibration of feature significance. This method demonstrates significant improvements, especially with highly non-stationary data, surpassing the state-of-the-art attention techniques in model performance, up to approximately +20% (abs.) in accuracy. Empirically, DAAM exhibits superior adaptability and efficacy across a diverse range of tasks, including emotion recognition in speech, image classification, and text classification, thereby establishing its robustness and versatility in handling data across multiple modalities. Furthermore, we introduce the Importance Factor, a new learning-based metric that enhances the explainability of models trained with DAAM-based methods.
翻译:我们提出了多头密度自适应注意力机制(DAAM),一种可用于参数高效微调(PEFT)的新型概率注意力框架,以及旨在增强跨语音、文本和视觉等多模态信息聚合的密度自适应Transformer(DAT)。DAAM将可学习的均值与方差集成到其注意力机制中,并在多头框架中实现,使其能够共同建模任意概率分布,以实现特征重要性的动态重新校准。该方法展现出显著改进,尤其对于高度非平稳数据,在模型性能上超越了最先进的注意力技术,准确率提升高达约+20%(绝对值)。实证表明,DAAM在语音情感识别、图像分类和文本分类等多种任务中均表现出卓越的适应性和有效性,从而确立了其处理跨模态数据的鲁棒性和普适性。此外,我们引入了重要性因子,这是一种新的基于学习的度量指标,可增强基于DAAM方法训练模型的可解释性。