We present a parameter-efficient method for continual video question-answering (VidQA) learning. Our method, named DAM, uses the proposed Dynamic Adapter Merging to (i) mitigate catastrophic forgetting, (ii) enable efficient adaptation to continually arriving datasets, (iii) handle inputs from unknown datasets during inference, and (iv) enable knowledge sharing across similar dataset domains. Given a set of continually streaming VidQA datasets, we sequentially train dataset-specific adapters for each dataset while freezing the parameters of a large pretrained video-language backbone. During inference, given a video-question sample from an unknown domain, our method first uses the proposed non-parametric router function to compute a probability for each adapter, reflecting how relevant that adapter is to the current video-question input instance. Subsequently, the proposed dynamic adapter merging scheme aggregates all the adapter weights into a new adapter instance tailored for that particular test sample to compute the final VidQA prediction, mitigating the impact of inaccurate router predictions and facilitating knowledge sharing across domains. Our DAM model outperforms prior state-of-the-art continual learning approaches by 9.1% while exhibiting 1.9% less forgetting on 6 VidQA datasets spanning various domains. We further extend DAM to continual image classification and image QA and outperform prior methods by a large margin. The code is publicly available at: https://github.com/klauscc/DAM
翻译:我们提出了一种用于连续视频问答(VidQA)学习的参数高效方法。该方法名为DAM,通过所提出的动态适配器合并技术实现以下目标:(i) 缓解灾难性遗忘,(ii) 实现对新持续到达数据集的高效适配,(iii) 在推理阶段处理来自未知数据集的输入,以及 (iv) 促进跨相似数据集领域的知识共享。针对一组持续流式输入的VidQA数据集,我们在冻结大型预训练视频-语言骨干网络参数的基础上,为每个数据集顺序训练数据集专属适配器。推理时,面对未知领域的视频-问题样本,该方法首先利用所提出的非参数路由函数计算每个适配器的概率值,以反映该适配器与当前视频-问题输入实例的相关程度。随后,动态适配器合并方案将全部适配器权重聚合为针对该测试样本定制的新适配器实例,用于生成最终VidQA预测结果,从而缓解路由预测不准确的影响并促进跨领域知识共享。我们的DAM模型在涵盖多个领域的6个VidQA数据集上,以比先前最先进持续学习方法遗忘率低1.9%的优势,实现了9.1%的性能提升。我们将DAM进一步扩展到连续图像分类和图像问答任务,同样大幅超越现有方法。代码开源地址:https://github.com/klauscc/DAM