Neural Attention: Enhancing QKV Calculation in Self-Attention Mechanism with Neural Networks

In the realm of deep learning, the self-attention mechanism has substantiated its pivotal role across a myriad of tasks, encompassing natural language processing and computer vision. Despite achieving success across diverse applications, the traditional self-attention mechanism primarily leverages linear transformations for the computation of query, key, and value (QKV), which may not invariably be the optimal choice under specific circumstances. This paper probes into a novel methodology for QKV computation-implementing a specially-designed neural network structure for the calculation. Utilizing a modified Marian model, we conducted experiments on the IWSLT 2017 German-English translation task dataset and juxtaposed our method with the conventional approach. The experimental results unveil a significant enhancement in BLEU scores with our method. Furthermore, our approach also manifested superiority when training the Roberta model with the Wikitext-103 dataset, reflecting a notable reduction in model perplexity compared to its original counterpart. These experimental outcomes not only validate the efficacy of our method but also reveal the immense potential in optimizing the self-attention mechanism through neural network-based QKV computation, paving the way for future research and practical applications. The source code and implementation details for our proposed method can be accessed at https://github.com/ocislyjrti/NeuralAttention.

翻译：在深度学习领域，自注意力机制已在自然语言处理和计算机视觉等众多任务中证实了其关键作用。尽管传统自注意力机制在各种应用中取得了成功，但其主要用于查询、键和值（QKV）计算的线性变换在某些特定场景下并非始终是最优选择。本文探索了一种新颖的QKV计算方法——采用专门设计的神经网络结构进行计算。利用改进后的Marian模型，我们在IWSLT 2017德英翻译任务数据集上进行了实验，并将我们的方法与传统方法进行了对比。实验结果显示，我们的方法显著提升了BLEU分数。此外，在使用Wikitext-103数据集训练Roberta模型时，我们的方法也表现出优越性，模型困惑度较原始版本显著降低。这些实验结果不仅验证了我们方法的有效性，还揭示了通过基于神经网络的QKV计算优化自注意力机制的巨大潜力，为未来的研究和实际应用开辟了道路。我们提出的方法的源代码和实现细节可在https://github.com/ocislyjrti/NeuralAttention获取。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日