RhinoVLA Technical Report

Huixi Intelligence, :,Chen Zhang,Chenyang Zhou,Guanglei Ding,Guanghui He,Haibin Gao,Jiajia Chen,Jianyong Zhang,Lianyi Yu,Ningyi Xu,Ping Xu,Qingchen Li,Yingjun Hu,Yijia Zhang,Yuxi Liu

Vision-Language-Action (VLA) models have shown strong potential for robotic manipulation, but real-time deployment on edge hardware remains challenging. In this work, we identify VLM visual and context tokens as a major source of deployment latency: for GEMM-dominated projection operators, computation grows linearly with the number of input tokens when model dimensions are fixed. Motivated by this observation, we propose RhinoVLA, a deployment-oriented VLA model co-designed with the Huixi R1 edge SoC. RhinoVLA adopts a token-efficient Qwen3-VL backbone and a continuous Action Expert, reducing the VLM-side token and computation burden while preserving pretrained multimodal capability. To support cross-robot learning, RhinoVLA further introduces a unified interface that combines View Registry, 72D physical state-action slot space, and robotinstance LoRA, allowing heterogeneous robot observations and action schemas to be aligned under a shared policy. On the deployment side, RhinoVLA is optimized through hardware-aware compilation, mixed-precision execution, and parallel visual encoding. Experiments show that RhinoVLA achieves downstream performance comparable to π0.5 at a similar parameter scale, while reaching 11.69 Hz end-to-end inference on Huixi R1, meeting the 10 Hz real-time closedloop control target. The project will be open-sourced at https://github.com/HuixiAI/RhinoVLA.

翻译：暂无翻译

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【ICML 2026】 StableVLA：无需额外数据，基于信息瓶颈的自适应鲁棒性视觉-语言-动作模型

专知会员服务

6+阅读 · 5月19日

CVPR 2026 Findings | 算力砍半、性能不降！全开源 A₁模型：让机器人大模型真正走向落地

专知会员服务

15+阅读 · 4月12日

《迈向快速规模化可信人工智能的联盟军事应用之路》最新报告

专知会员服务

22+阅读 · 2月24日

视觉-语言-动作（VLA）模型的前世今生

专知会员服务

21+阅读 · 2025年8月29日