ArK: Augmented Reality with Knowledge Interactive Emergent Ability

Despite the growing adoption of mixed reality and interactive AI agents, it remains challenging for these systems to generate high quality 2D/3D scenes in unseen environments. The common practice requires deploying an AI agent to collect large amounts of data for model training for every new task. This process is costly, or even impossible, for many domains. In this study, we develop an infinite agent that learns to transfer knowledge memory from general foundation models (e.g. GPT4, DALLE) to novel domains or scenarios for scene understanding and generation in the physical or virtual world. The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK), which leverages knowledge-memory to generate scenes in unseen physical world and virtual reality environments. The knowledge interactive emergent ability (Figure 1) is demonstrated as the observation learns i) micro-action of cross-modality: in multi-modality models to collect a large amount of relevant knowledge memory data for each interaction task (e.g., unseen scene understanding) from the physical reality; and ii) macro-behavior of reality-agnostic: in mix-reality environments to improve interactions that tailor to different characterized roles, target variables, collaborative information, and so on. We validate the effectiveness of ArK on the scene generation and editing tasks. We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes, compared to baselines, demonstrating the potential benefit of incorporating ArK in generative AI for applications such as metaverse and gaming simulation.

翻译：摘要：尽管混合现实和交互式AI代理的采用日益广泛，这些系统在未知环境中生成高质量2D/3D场景仍面临挑战。常规做法需要部署AI代理为每个新任务收集大量数据以训练模型，这一过程在众多领域中成本高昂甚至难以实现。本研究开发了一种无限代理，能够学习将通用基础模型（如GPT4、DALLE）中的知识记忆迁移至新型领域或场景，用于物理或虚拟世界中的场景理解与生成。该方法的核心是一种名为"增强现实与知识推理交互"（ArK）的新兴机制，该机制利用知识记忆在未知物理世界和虚拟现实环境中生成场景。知识交互涌现能力（如图1所示）体现为：i）跨模态微观动作：通过多模态模型从物理现实中为每个交互任务（如未知场景理解）收集大量相关知识记忆数据；ii）跨现实宏观行为：在混合现实环境中提升针对不同角色特征、目标变量、协作信息等的定制化交互。我们在场景生成与编辑任务上验证了ArK的有效性。结果表明，与基线方法相比，ArK方法结合大型基础模型能显著提升生成2D/3D场景的质量，展现了将ArK融入生成式AI（如元宇宙和游戏仿真应用）的潜在价值。