场景理解论文 - 专知

会员服务 ·

场景理解

3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding

Arxiv

0+阅读 · 6月12日

Latent Diffusion Policy: Shaping Latent Spaces for Diffusion-Based Robotic Manipulation

Arxiv

0+阅读 · 6月7日

RISE: Single Static Radar-based Indoor Scene Understanding

Arxiv

0+阅读 · 6月5日

Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

Arxiv

0+阅读 · 3月19日

The Limits of Learning from Pictures and Text: Vision-Language Models and Embodied Scene Understanding

Arxiv

0+阅读 · 3月27日

VL-KnG: Persistent Spatiotemporal Knowledge Graphs from Egocentric Video for Embodied Scene Understanding

Arxiv

0+阅读 · 3月24日

Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs

Arxiv

0+阅读 · 4月23日

CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning

Arxiv

0+阅读 · 4月10日

CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning

Arxiv

0+阅读 · 4月9日

ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding

Arxiv

0+阅读 · 4月6日

Beyond Referring Expressions: Scenario Comprehension Visual Grounding

Arxiv

0+阅读 · 4月2日

AutoMoT: A Unified Vision-Language-Action Model with Asynchronous Mixture-of-Transformers for End-to-End Autonomous Driving

AutoMoT: A Unified Vision-Language-Action Model with Asynchronous Mixture-of-Transformers for End-to-End Autonomous Driving

Arxiv

0+阅读 · 3月18日

P$^{3}$Nav: End-to-End Perception, Prediction and Planning for Vision-and-Language Navigation

Arxiv

0+阅读 · 3月18日

OGScene3D: Incremental Open-Vocabulary 3D Gaussian Scene Graph Mapping for Scene Understanding

Arxiv

0+阅读 · 3月17日

AutoMoT: A Unified Vision-Language-Action Model with Asynchronous Mixture-of-Transformers for End-to-End Autonomous Driving

Arxiv

0+阅读 · 3月16日

参考链接

微信扫码咨询专知VIP会员