In this technical report, we present our findings from a study conducted on the EPIC-KITCHENS-100 Unsupervised Domain Adaptation task for Action Recognition. Our research focuses on the innovative application of a differentiable logic loss in the training to leverage the co-occurrence relations between verb and noun, as well as the pre-trained Large Language Models (LLMs) to generate the logic rules for the adaptation to unseen action labels. Specifically, the model's predictions are treated as the truth assignment of a co-occurrence logic formula to compute the logic loss, which measures the consistency between the predictions and the logic constraints. By using the verb-noun co-occurrence matrix generated from the dataset, we observe a moderate improvement in model performance compared to our baseline framework. To further enhance the model's adaptability to novel action labels, we experiment with rules generated using GPT-3.5, which leads to a slight decrease in performance. These findings shed light on the potential and challenges of incorporating differentiable logic and LLMs for knowledge extraction in unsupervised domain adaptation for action recognition. Our final submission (entitled `NS-LLM') achieved the first place in terms of top-1 action recognition accuracy.
翻译:本技术报告介绍了我们在EPIC-KITCHENS-100无监督域适应动作识别任务中的研究成果。我们聚焦于可微逻辑损失函数的创新应用,利用动词与名词的共现关系进行训练,并结合预训练大语言模型(LLMs)生成逻辑规则,以实现对未见动作标签的域适应。具体而言,我们将模型预测视为共现逻辑公式的真值赋值,通过计算逻辑损失衡量预测结果与逻辑约束的一致性。基于数据集生成的动词-名词共现矩阵,我们观察到模型性能相较于基准框架有中等程度的提升。为进一步增强模型对新动作标签的适应性,我们尝试使用GPT-3.5生成的规则进行实验,但导致性能略有下降。这些发现揭示了将可微逻辑与大语言模型融入无监督域适应动作识别知识提取的潜力与挑战。最终提交方案(命名为“NS-LLM”)在top-1动作识别准确率指标上取得了第一名。