iTeach：利用混合现实实现机器人感知的交互式教学 (iTeach: Interactive Teaching for Robot Perception using Mixed Reality)

Robots deployed in the wild often encounter objects and scenes that break pre-trained perception models, yet adapting these models typically requires slow offline data collection, labeling, and retraining. We introduce iTeach, a human-in-the-loop system that enables robots to improve perception continuously as they explore new environments. A human sees the robot's predictions from its own viewpoint, corrects failures in real time, and the informed data drives iterative fine-tuning until performance is satisfactory. A mixed reality headset provides the interface, overlaying predictions in the user's view and enabling lightweight annotation via eye gaze and voice. Instead of tedious frame-by-frame labeling, a human guides the robot to scenes of choice and records short videos while interacting with objects. The human labels only the final frame, and a video segmentation model propagates labels across the sequence, converting seconds of input into dense supervision. The refined model is deployed immediately, closing the loop between human feedback and robot learning. We demonstrate iTeach on Unseen Object Instance Segmentation (UOIS), achieving consistent improvements over a pre-trained MSMFormer baseline on both our collected dataset and the SceneReplica benchmark, where it leads to higher grasping success, followed by a real-world demonstration of grasping unseen objects with a Fetch robot. By combining human judgment, efficient annotation, and on-the-fly refinement, iTeach provides a practical path toward perception systems that generalize robustly in diverse real-world conditions. Project page at https://irvlutd.github.io/iTeach

翻译：在野外部署的机器人常会遇到使预训练感知模型失效的物体与场景，而调整这些模型通常需要缓慢的离线数据收集、标注和重新训练。我们提出了iTeach，一种人在回路的系统，使机器人能够在探索新环境时持续改进感知能力。人类从自身视角观察机器人的预测结果，实时纠正错误，这些经人工修正的数据将驱动迭代式微调直至性能达到满意水平。系统通过混合现实头显提供交互界面，在用户视野中叠加预测结果，并支持通过视线注视与语音进行轻量级标注。人类无需逐帧进行繁琐标注，而是引导机器人前往选定场景，在与物体交互时录制短视频。用户仅需标注最终帧，视频分割模型即可将标签传播至整个序列，将数秒的输入转化为密集监督信号。优化后的模型可即时部署，从而形成人类反馈与机器人学习之间的闭环。我们在未见物体实例分割任务上验证了iTeach系统，在自收集数据集和SceneReplica基准测试中均较预训练的MSMFormer基线模型取得持续改进，其中在SceneReplica上实现了更高的抓取成功率，随后通过Fetch机器人进行了真实场景下抓取未见物体的演示。通过融合人类判断、高效标注与即时优化，iTeach为构建能在多样化现实场景中稳健泛化的感知系统提供了可行路径。项目页面：https://irvlutd.github.io/iTeach