This report presents ReLER submission to two tracks in the Ego4D Episodic Memory Benchmark in CVPR 2023, including Natural Language Queries and Moment Queries. This solution inherits from our proposed Action Sensitivity Learning framework (ASL) to better capture discrepant information of frames. Further, we incorporate a series of stronger video features and fusion strategies. Our method achieves an average mAP of 29.34, ranking 1st in Moment Queries Challenge, and garners 19.79 mean R1, ranking 2nd in Natural Language Queries Challenge. Our code will be released.
翻译:本报告呈现了ReLER团队在CVPR 2023举办的Ego4D情景记忆基准测试中两个赛道(自然语言查询和时刻查询)的参赛方案。该方案继承了我们提出的动作敏感性学习框架(ASL),以更好地捕捉帧间的差异信息。此外,我们融入了一系列更强的视频特征和融合策略。我们的方法在时刻查询挑战中取得了29.34的平均mAP,位列第一;在自然语言查询挑战中取得了19.79的平均R1值,位列第二。我们的代码将予以公开。