Raw videos have been proven to own considerable feature redundancy where in many cases only a portion of frames can already meet the requirements for accurate recognition. In this paper, we are interested in whether such redundancy can be effectively leveraged to facilitate efficient inference in continuous sign language recognition (CSLR). We propose a novel adaptive model (AdaBrowse) to dynamically select a most informative subsequence from input video sequences by modelling this problem as a sequential decision task. In specific, we first utilize a lightweight network to quickly scan input videos to extract coarse features. Then these features are fed into a policy network to intelligently select a subsequence to process. The corresponding subsequence is finally inferred by a normal CSLR model for sentence prediction. As only a portion of frames are processed in this procedure, the total computations can be considerably saved. Besides temporal redundancy, we are also interested in whether the inherent spatial redundancy can be seamlessly integrated together to achieve further efficiency, i.e., dynamically selecting a lowest input resolution for each sample, whose model is referred to as AdaBrowse+. Extensive experimental results on four large-scale CSLR datasets, i.e., PHOENIX14, PHOENIX14-T, CSL-Daily and CSL, demonstrate the effectiveness of AdaBrowse and AdaBrowse+ by achieving comparable accuracy with state-of-the-art methods with 1.44$\times$ throughput and 2.12$\times$ fewer FLOPs. Comparisons with other commonly-used 2D CNNs and adaptive efficient methods verify the effectiveness of AdaBrowse. Code is available at \url{https://github.com/hulianyuyy/AdaBrowse}.
翻译:原始视频已被证明存在显著的特征冗余,在许多情况下仅需部分帧即可满足准确识别的需求。本文旨在探究能否有效利用此类冗余来促进连续手语识别(CSLR)的高效推理。我们提出了一种新颖的自适应模型(AdaBrowse),通过将该问题建模为序列决策任务,从输入视频序列中动态选择信息量最丰富的子序列。具体而言,我们首先利用轻量级网络快速扫描输入视频以提取粗粒度特征,随后将这些特征输入策略网络智能选择待处理的子序列,最终由常规CSLR模型对所选子序列进行句子级预测。由于该流程仅处理部分视频帧,总计算量可显著降低。除时间冗余外,我们进一步探究能否将空间冗余无缝集成以实现更高效率——即为每个样本动态选择最低输入分辨率,该模型被命名为AdaBrowse+。在四个大规模CSLR数据集(PHOENIX14、PHOENIX14-T、CSL-Daily和CSL)上的广泛实验结果表明,AdaBrowse与AdaBrowse+在保持与现有最优方法相当准确率的同时,实现了1.44倍吞吐量提升及2.12倍FLOPs降低。与常用2D CNN及其他自适应高效方法的对比验证了AdaBrowse的有效性。代码开源见:\url{https://github.com/hulianyuyy/AdaBrowse}。