EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation

In this report, we present our solutions to the EgoVis Challenges in CVPR 2024, including five tracks in the Ego4D challenge and three tracks in the EPIC-Kitchens challenge. Building upon the video-language two-tower model and leveraging our meticulously organized egocentric video data, we introduce a novel foundation model called EgoVideo. This model is specifically designed to cater to the unique characteristics of egocentric videos and provides strong support for our competition submissions. In the Ego4D challenges, we tackle various tasks including Natural Language Queries, Step Grounding, Moment Queries, Short-term Object Interaction Anticipation, and Long-term Action Anticipation. In addition, we also participate in the EPIC-Kitchens challenge, where we engage in the Action Recognition, Multiple Instance Retrieval, and Domain Adaptation for Action Recognition tracks. By adapting EgoVideo to these diverse tasks, we showcase its versatility and effectiveness in different egocentric video analysis scenarios, demonstrating the powerful representation ability of EgoVideo as an egocentric foundation model. Our codebase and pretrained models are publicly available at https://github.com/OpenGVLab/EgoVideo.

翻译：本报告介绍了我们在CVPR 2024 EgoVis挑战赛中的解决方案，涵盖了Ego4D挑战赛的五个赛道以及EPIC-Kitchens挑战赛的三个赛道。基于视频-语言双塔模型，并利用我们精心整理的第一人称视角视频数据，我们提出了一种名为EgoVideo的新型基础模型。该模型专门针对第一人称视角视频的独特特性设计，为我们的竞赛提交提供了有力支持。在Ego4D挑战赛中，我们处理了包括自然语言查询、步骤定位、时刻查询、短期物体交互预测以及长期动作预测在内的多项任务。此外，我们还参与了EPIC-Kitchens挑战赛，涉及动作识别、多实例检索以及动作识别的领域适应等赛道。通过将EgoVideo适配于这些多样化任务，我们展示了其在不同第一人称视角视频分析场景中的通用性和有效性，证明了EgoVideo作为第一人称视角基础模型强大的表征能力。我们的代码库与预训练模型已在https://github.com/OpenGVLab/EgoVideo 公开。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日