视频理解论文 - 专知

会员服务 ·

视频理解

LiveStarPro: Proactive Streaming Video Understanding with Hierarchical Memory for Long-Horizon Streams

Arxiv

0+阅读 · 6月16日

Decoupled Object-Centric Video Understanding for Generating Robotic Manipulation Commands

Arxiv

0+阅读 · 6月15日

FrameOracle: Learning What to See and How Much to See in Videos

Arxiv

0+阅读 · 6月13日

What Should a Streaming Video Model Remember?

Arxiv

0+阅读 · 6月15日

Q-Fold: Query-Aware Focus-Context Spatio-Temporal Folding for Long Video Understanding

Arxiv

0+阅读 · 6月10日

Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks

Arxiv

0+阅读 · 6月10日

From Content to Knowledge: Lightning Fast Long-Video Understanding with Neural Knowledge Representations

Arxiv

0+阅读 · 6月10日

ViMU: Benchmarking Video Metaphorical Understanding

Arxiv

0+阅读 · 5月14日

Watch, Remember, Reason: Human-View Video Understanding with MLLMs

Arxiv

0+阅读 · 6月5日

MedHorizon: Towards Long-context Medical Video Understanding in the Wild

Arxiv

0+阅读 · 5月7日

Mobile-VideoGPT: Fast and Accurate Model for Mobile Video Understanding

Arxiv

0+阅读 · 3月19日

Exploring High-Order Self-Similarity for Video Understanding

Arxiv

0+阅读 · 4月22日

VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding

Arxiv

0+阅读 · 3月30日

Video Panels for Long Video Understanding

Arxiv

0+阅读 · 4月20日

StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos

Arxiv

0+阅读 · 3月27日

参考链接

微信扫码咨询专知VIP会员