We introduce a method called the Expansion mechanism that processes the input unconstrained by the number of elements in the sequence. By doing so, the model can learn more effectively compared to traditional attention-based approaches. To support this claim, we design a novel architecture ExpansionNet v2 that achieved strong results on the MS COCO 2014 Image Captioning challenge and the State of the Art in its respective category, with a score of 143.7 CIDErD in the offline test split, 140.8 CIDErD in the online evaluation server and 72.9 AllCIDEr on the nocaps validation set. Additionally, we introduce an End to End training algorithm up to 2.8 times faster than established alternatives. Source code available at: https://github.com/jchenghu/ExpansionNet_v2
翻译:我们提出一种名为扩展机制的方法,该方法在处理输入时不受序列中元素数量的限制。与传统的基于注意力机制的方法相比,该模型能够更有效地进行学习。为了支持这一论点,我们设计了一种新颖的架构ExpansionNet v2,该架构在MS COCO 2014图像描述挑战赛及其所属类别中均取得了优异成绩,具体表现为:离线测试分割中CIDErD得分为143.7,在线评估服务器上为140.8,在nocaps验证集上AllCIDEr得分为72.9。此外,我们提出了一种端到端训练算法,其速度比现有替代方案快2.8倍。源代码可在以下网址获取:https://github.com/jchenghu/ExpansionNet_v2