Most existing hand motion generation benchmarks for hand-object interaction (HOI) focus on static objects, leaving dynamic scenarios with moving targets and time-critical coordination largely untested. To address this gap, we introduce the DynaHOI-Gym, a unified online closed-loop platform with parameterized motion generators and rollout-based metrics for dynamic capture evaluation. Built on DynaHOI-Gym, we release DynaHOI-10M, a large-scale benchmark with 10M frames and 180K hand capture trajectories, whose target motions are organized into 8 major categories and 22 fine-grained subcategories. We also provide a simple observe-before-act baseline (ObAct) that integrates short-term observations with the current frame via spatiotemporal attention to predict actions, achieving an 8.1% improvement in location success rate.
翻译:现有大多数手-物交互(HOI)的手部运动生成基准主要针对静态物体,而涉及移动目标与时效性协调的动态场景尚未得到充分测试。为填补这一空白,我们提出了DynaHOI-Gym:一个统一的在线闭环平台,其配备参数化运动生成器及基于轨迹推演的动态捕捉评估指标。基于DynaHOI-Gym平台,我们发布了包含1000万帧图像与18万条手部捕捉轨迹的大规模基准数据集DynaHOI-10M,其中目标运动被划分为8个主要类别与22个细粒度子类别。同时,我们提出了一种简单的"先观察后行动"基线方法(ObAct),该方法通过时空注意力机制将短期观测与当前帧信息融合以预测动作,实现了定位成功率8.1%的提升。