In this paper, we introduce the Actions and Objects Pathways (AOPath) for out-of-domain generalization in video question answering tasks. AOPath leverages features from a large pretrained model to enhance generalizability without the need for explicit training on the unseen domains. Inspired by human brain, AOPath dissociates the pretrained features into action and object features, and subsequently processes them through separate reasoning pathways. It utilizes a novel module which converts out-of-domain features into domain-agnostic features without introducing any trainable weights. We validate the proposed approach on the TVQA dataset, which is partitioned into multiple subsets based on genre to facilitate the assessment of generalizability. The proposed approach demonstrates 5% and 4% superior performance over conventional classifiers on out-of-domain and in-domain datasets, respectively. It also outperforms prior methods that involve training millions of parameters, whereas the proposed approach trains very few parameters.
翻译:本文针对视频问答任务中的跨领域泛化问题,提出了动作与物体通路(AOPath)方法。AOPath利用大规模预训练模型提取的特征来增强泛化能力,无需在未见领域进行显式训练。受人类大脑认知机制启发,AOPath将预训练特征解耦为动作特征与物体特征,并通过独立的推理通路进行处理。该方法采用一种创新模块,能够在不引入任何可训练参数的情况下,将领域外特征转换为领域无关特征。我们在TVQA数据集上验证了所提方法的有效性,该数据集已按节目类型划分为多个子集以支持泛化能力评估。实验表明,所提方法在领域外和领域内数据集上分别比传统分类器性能提升5%和4%。同时,该方法在仅训练极少参数的情况下,其性能超越了需要训练数百万参数的现有方法。