Robots are becoming increasingly integrated into our lives, assisting us in various tasks. To ensure effective collaboration between humans and robots, it is essential that they understand our intentions and anticipate our actions. In this paper, we propose a Human-Object Interaction (HOI) anticipation framework for collaborative robots. We propose an efficient and robust transformer-based model to detect and anticipate HOIs from videos. This enhanced anticipation empowers robots to proactively assist humans, resulting in more efficient and intuitive collaborations. Our model outperforms state-of-the-art results in HOI detection and anticipation in VidHOI dataset with an increase of 1.76% and 1.04% in mAP respectively while being 15.4 times faster. We showcase the effectiveness of our approach through experimental results in a real robot, demonstrating that the robot's ability to anticipate HOIs is key for better Human-Robot Interaction. More information can be found on our project webpage: https://evm7.github.io/HOI4ABOT_page/
翻译:机器人正日益融入我们的生活,协助我们完成各种任务。为确保人类与机器人之间的有效协作,机器人必须理解人类的意图并预测其动作。本文提出一种面向协作机器人的人-物交互(HOI)预测框架。我们设计了一个高效且鲁棒的基于Transformer的模型,用于从视频中检测和预测HOI。这种增强的预测能力使机器人能够主动协助人类,从而实现更高效、更直观的协作。在VidHOI数据集上,我们的模型在HOI检测和预测任务中分别以mAP指标提升1.76%和1.04%的性能超越现有最优结果,同时速度提升15.4倍。通过在真实机器人上的实验结果,我们展示了该方法的有效性,证明机器人预测HOI的能力对改善人机交互至关重要。更多信息可访问项目网页:https://evm7.github.io/HOI4ABOT_page/