RelaMiX: Exploring Few-Shot Adaptation in Video-based Action Recognition

Domain adaptation is essential for activity recognition to ensure accurate and robust performance across diverse environments, sensor types, and data sources. Unsupervised domain adaptation methods have been extensively studied, yet, they require large-scale unlabeled data from the target domain. In this work, we address Few-Shot Domain Adaptation for video-based Activity Recognition (FSDA-AR), which leverages a very small amount of labeled target videos to achieve effective adaptation. This setting is attractive and promising for applications, as it requires recording and labeling only a few, or even a single example per class in the target domain, which often includes activities that are rare yet crucial to recognize. We construct FSDA-AR benchmarks using five established datasets considering diverse domain types: UCF101, HMDB51, EPIC-KITCHEN, Sims4Action, and ToyotaSmartHome. Our results demonstrate that FSDA-AR performs comparably to unsupervised domain adaptation with significantly fewer (yet labeled) target domain samples. We further propose a novel approach, RelaMiX, to better leverage the few labeled target domain samples as knowledge guidance. RelaMiX encompasses a temporal relational attention network with relation dropout, alongside a cross-domain information alignment mechanism. Furthermore, it integrates a mechanism for mixing features within a latent space by using the few-shot target domain samples. The proposed RelaMiX solution achieves state-of-the-art performance on all datasets within the FSDA-AR benchmark. To encourage future research of few-shot domain adaptation for video-based activity recognition, our benchmarks and source code are made publicly available at https://github.com/KPeng9510/RelaMiX.

翻译：领域自适应对于活动识别至关重要，以确保在多样化的环境、传感器类型和数据源下实现准确且鲁棒的性能。无监督领域自适应方法已被广泛研究，但此类方法需要目标域的大规模未标注数据。本文聚焦于视频活动识别的少样本领域自适应问题（FSDA-AR），该方法利用极少量标注的目标域视频实现高效自适应。该设置对实际应用具有吸引力且前景广阔，因为仅在目标域中每类记录并标注少量甚至单个样本即可实现，而这类样本往往包含罕见却至关重要的活动。基于五个涵盖不同域类型的公开数据集（UCF101、HMDB51、EPIC-KITCHEN、Sims4Action与ToyotaSmartHome），我们构建了FSDA-AR基准测试。实验结果表明，FSDA-AR在显著减少目标域样本量（但需标注）的条件下，性能与无监督领域自适应相当。我们进一步提出创新方法RelaMiX，以更充分地利用少量标注的目标域样本作为知识引导。RelaMiX包含带关系丢弃的时间关系注意力网络，以及跨域信息对齐机制。此外，该方法通过利用少样本目标域样本，在潜在空间内集成特征混合机制。所提出的RelaMiX方案在FSDA-AR基准测试的所有数据集上均达到了最优性能。为促进视频活动识别少样本领域自适应的未来研究，我们公开了基准测试与源代码，地址为https://github.com/KPeng9510/RelaMiX。