Impact of visual assistance for automated audio captioning

We study the impact of visual assistance for automated audio captioning. Utilizing multi-encoder transformer architectures, which have previously been employed to introduce vision-related information in the context of sound event detection, we analyze the usefulness of incorporating a variety of pretrained features. We perform experiments on a YouTube-based audiovisual data set and investigate the effect of applying the considered transfer learning technique in terms of a variety of captioning metrics. We find that only one of the considered kinds of pretrained features provides consistent improvements, while the others do not provide any noteworthy gains at all. Interestingly, the outcomes of prior research efforts indicate that the exact opposite is true in the case of sound event detection, leading us to conclude that the optimal choice of visual embeddings is strongly dependent on the task at hand. More specifically, visual features focusing on semantics appear appropriate in the context of automated audio captioning, while for sound event detection, time information seems to be more important.

翻译：我们研究了视觉辅助对自动音频字幕生成的影响。利用此前在声音事件检测中引入视觉相关信息的多编码器Transformer架构，我们分析了融入多种预训练特征的有效性。我们在基于YouTube的视听数据集上进行了实验，并从多种字幕生成指标角度考察了所采用的迁移学习技术的效果。结果发现，仅有一类预训练特征能带来持续改进，而其他特征并未产生显著增益。有趣的是，此前研究结果表明，在声音事件检测中情况恰恰相反，这使我们得出结论：视觉嵌入的最优选择高度依赖于具体任务。具体而言，在自动音频字幕生成中，聚焦语义的视觉特征更为适用；而在声音事件检测中，时间信息似乎更为重要。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

128+阅读 · 2019年12月13日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日