AdaBrowse: Adaptive Video Browser for Efficient Continuous Sign Language Recognition

Raw videos have been proven to own considerable feature redundancy where in many cases only a portion of frames can already meet the requirements for accurate recognition. In this paper, we are interested in whether such redundancy can be effectively leveraged to facilitate efficient inference in continuous sign language recognition (CSLR). We propose a novel adaptive model (AdaBrowse) to dynamically select a most informative subsequence from input video sequences by modelling this problem as a sequential decision task. In specific, we first utilize a lightweight network to quickly scan input videos to extract coarse features. Then these features are fed into a policy network to intelligently select a subsequence to process. The corresponding subsequence is finally inferred by a normal CSLR model for sentence prediction. As only a portion of frames are processed in this procedure, the total computations can be considerably saved. Besides temporal redundancy, we are also interested in whether the inherent spatial redundancy can be seamlessly integrated together to achieve further efficiency, i.e., dynamically selecting a lowest input resolution for each sample, whose model is referred to as AdaBrowse+. Extensive experimental results on four large-scale CSLR datasets, i.e., PHOENIX14, PHOENIX14-T, CSL-Daily and CSL, demonstrate the effectiveness of AdaBrowse and AdaBrowse+ by achieving comparable accuracy with state-of-the-art methods with 1.44$\times$ throughput and 2.12$\times$ fewer FLOPs. Comparisons with other commonly-used 2D CNNs and adaptive efficient methods verify the effectiveness of AdaBrowse. Code is available at \url{https://github.com/hulianyuyy/AdaBrowse}.

翻译：原始视频已被证明存在显著的特征冗余，在许多情况下仅需部分帧即可满足准确识别的需求。本文旨在探究能否有效利用此类冗余来促进连续手语识别（CSLR）的高效推理。我们提出了一种新颖的自适应模型（AdaBrowse），通过将该问题建模为序列决策任务，从输入视频序列中动态选择信息量最丰富的子序列。具体而言，我们首先利用轻量级网络快速扫描输入视频以提取粗粒度特征，随后将这些特征输入策略网络智能选择待处理的子序列，最终由常规CSLR模型对所选子序列进行句子级预测。由于该流程仅处理部分视频帧，总计算量可显著降低。除时间冗余外，我们进一步探究能否将空间冗余无缝集成以实现更高效率——即为每个样本动态选择最低输入分辨率，该模型被命名为AdaBrowse+。在四个大规模CSLR数据集（PHOENIX14、PHOENIX14-T、CSL-Daily和CSL）上的广泛实验结果表明，AdaBrowse与AdaBrowse+在保持与现有最优方法相当准确率的同时，实现了1.44倍吞吐量提升及2.12倍FLOPs降低。与常用2D CNN及其他自适应高效方法的对比验证了AdaBrowse的有效性。代码开源见：\url{https://github.com/hulianyuyy/AdaBrowse}。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日