Recently significant progress has been made in human action recognition and behavior prediction using deep learning techniques, leading to improved vision-based semantic understanding. However, there is still a lack of high-quality motion datasets for small bio-robotics, which presents more challenging scenarios for long-term movement prediction and behavior control based on third-person observation. In this study, we introduce RatPose, a bio-robot motion prediction dataset constructed by considering the influence factors of individuals and environments based on predefined annotation rules. To enhance the robustness of motion prediction against these factors, we propose a Dual-stream Motion-Scenario Decoupling (\textit{DMSD}) framework that effectively separates scenario-oriented and motion-oriented features and designs a scenario contrast loss and motion clustering loss for overall training. With such distinctive architecture, the dual-branch feature flow information is interacted and compensated in a decomposition-then-fusion manner. Moreover, we demonstrate significant performance improvements of the proposed \textit{DMSD} framework on different difficulty-level tasks. We also implement long-term discretized trajectory prediction tasks to verify the generalization ability of the proposed dataset.
翻译:近期,基于深度学习技术的人类动作识别与行为预测取得了显著进展,从而提升了基于视觉的语义理解能力。然而,针对小型生物机器人领域,目前仍缺乏高质量的运动数据集,这为基于第三人称观测的长期运动预测与行为控制带来了更具挑战性的场景。本研究引入RatPose——一个基于预定义标注规则、综合考虑个体与环境影响因素的生物机器人运动预测数据集。为了增强运动预测对这些因素的鲁棒性,我们提出了一种双流运动-场景解耦(DMSD)框架,该框架能够有效分离场景导向特征与运动导向特征,并设计了场景对比损失与运动聚类损失用于整体训练。通过这种独特的架构,双分支特征流信息以“分解-融合”的方式实现交互与补偿。此外,我们在不同难度任务上验证了所提出的DMSD框架的显著性能提升。我们还实施了长期离散化轨迹预测任务,以验证所提出数据集的泛化能力。