In the realm of deep learning, the self-attention mechanism has substantiated its pivotal role across a myriad of tasks, encompassing natural language processing and computer vision. Despite achieving success across diverse applications, the traditional self-attention mechanism primarily leverages linear transformations for the computation of query, key, and value (QKV), which may not invariably be the optimal choice under specific circumstances. This paper probes into a novel methodology for QKV computation-implementing a specially-designed neural network structure for the calculation. Utilizing a modified Marian model, we conducted experiments on the IWSLT 2017 German-English translation task dataset and juxtaposed our method with the conventional approach. The experimental results unveil a significant enhancement in BLEU scores with our method. Furthermore, our approach also manifested superiority when training the Roberta model with the Wikitext-103 dataset, reflecting a notable reduction in model perplexity compared to its original counterpart. These experimental outcomes not only validate the efficacy of our method but also reveal the immense potential in optimizing the self-attention mechanism through neural network-based QKV computation, paving the way for future research and practical applications. The source code and implementation details for our proposed method can be accessed at https://github.com/ocislyjrti/NeuralAttention.
翻译:在深度学习领域,自注意力机制已在自然语言处理和计算机视觉等众多任务中证实了其关键作用。尽管传统自注意力机制在各种应用中取得了成功,但其主要用于查询、键和值(QKV)计算的线性变换在某些特定场景下并非始终是最优选择。本文探索了一种新颖的QKV计算方法——采用专门设计的神经网络结构进行计算。利用改进后的Marian模型,我们在IWSLT 2017德英翻译任务数据集上进行了实验,并将我们的方法与传统方法进行了对比。实验结果显示,我们的方法显著提升了BLEU分数。此外,在使用Wikitext-103数据集训练Roberta模型时,我们的方法也表现出优越性,模型困惑度较原始版本显著降低。这些实验结果不仅验证了我们方法的有效性,还揭示了通过基于神经网络的QKV计算优化自注意力机制的巨大潜力,为未来的研究和实际应用开辟了道路。我们提出的方法的源代码和实现细节可在https://github.com/ocislyjrti/NeuralAttention获取。