Kairos: A Native World Model Stack for Physical AI

Kairos Team,Fei Wang,Shan You,Qiming Zhang,Tao Huang,Zuoyi Fu,Zhisheng Zheng,Yunlong Xi,Feng Lv,Xiaoming Wu,Zeyu Liu,Cong Wan,Pu Li,Ruiqing Yang,Xiaoou Li,Wei Wang,Kangkang Zhu,Yuwei Zhang,Shi Fu,Xiaoning Wu,Xuzeng Fan,Dacheng Tao,Xiaogang Wang

World models are transitioning from passive visual generators to foundational, operational infrastructure for Physical AI: they must natively acquire world knowledge from heterogeneous experience, maintain persistent states over long horizons, and execute efficiently within real deployment constraints. We introduce Kairos, a native world model stack designed around these requirements. (1) Kairos learns the world by pioneering a Native Pre-training Paradigm governed by a Cross-Embodiment Data Curriculum, which organizes open-world videos, human behavioral data, and robot interactions into a progressive developmental pathway. (2) Kairos maintains the world by unified world understanding, generation, and prediction within a Native Unified Architecture equipped with Hybrid Linear Temporal Attention, where sliding-window attention captures local dynamics, dilated sliding windows capture mid-range dependencies, and gated linear attention maintains persistent global memory. We establish formal theoretical bounds demonstrating that this temporal factorization strictly limits error accumulation, mathematically guaranteeing state propagation across extended horizons. (3) Kairos runs the world by incorporating a Deployment-Aware System Co-Design to support low-latency rollout generation on server and consumer-grade hardware for real-world observation-action-feedback loops. Experiments on embodied world-model, long-horizon, and action-policy benchmarks show that Kairos achieves top level performance while offering a strong efficiency-capability trade-off. Together, these results position Kairos as a cohesive operational foundation for future self-evolving physical intelligence.

翻译：世界模型正从被动的视觉生成器转变为物理智能的基础性、可运行基础设施：它们必须从异构经验中原生获取世界知识，在长时间跨度内维持持久状态，并在实际部署约束下高效执行。我们提出Kairos，一个围绕这些需求设计的原生世界模型栈。（1）Kairos通过首创由跨具身数据课程主导的原生预训练范式来学习世界，该课程将开放世界视频、人类行为数据与机器人交互组织成渐进式发展路径。（2）Kairos通过配备混合线性时序注意力的原生统一架构实现统一的世界理解、生成与预测，其中滑动窗口注意力捕获局部动态，膨胀滑动窗口捕获中程依赖，门控线性注意力维持持久全局记忆。我们建立了形式化理论界，证明这种时序分解严格限制了误差累积，从数学上保证了跨扩展时间范围的状态传播。（3）Kairos通过整合部署感知的系统协同设计来运行世界，支持在服务器级与消费级硬件上生成低延迟推理序列，实现真实世界的观测-动作-反馈闭环。在具身世界模型、长时域与动作策略基准测试上的实验表明，Kairos在实现顶级性能的同时提供了高效的性能-能力平衡。这些结果共同将Kairos定位为未来自演化物理智能的统一运行基础。