Predicting pedestrian motion is essential for developing socially-aware robots that interact in a crowded environment. While the natural visual perspective for a social interaction setting is an egocentric view, the majority of existing work in trajectory prediction therein has been investigated purely in the top-down trajectory space. To support first-person view trajectory prediction research, we present T2FPV, a method for constructing high-fidelity first-person view (FPV) datasets given a real-world, top-down trajectory dataset; we showcase our approach on the ETH/UCY pedestrian dataset to generate the egocentric visual data of all interacting pedestrians, creating the T2FPV-ETH dataset. In this setting, FPV-specific errors arise due to imperfect detection and tracking, occlusions, and field-of-view (FOV) limitations of the camera. To address these errors, we propose CoFE, a module that further refines the imputation of missing data in an end-to-end manner with trajectory forecasting algorithms. Our method reduces the impact of such FPV errors on downstream prediction performance, decreasing displacement error by more than 10% on average. To facilitate research engagement, we release our T2FPV-ETH dataset and software tools.
翻译:预测行人运动对于开发在拥挤环境中交互的社交感知机器人至关重要。尽管社交交互场景的自然视觉视角为第一人称视角,但现有轨迹预测研究大多仅基于俯视轨迹空间。为支持第一人称视角轨迹预测研究,我们提出T2FPV方法——基于真实俯视轨迹数据集构建高保真第一人称视角数据集的方法;我们将该方法应用于ETH/UCY行人数据集,生成了所有交互行人的第一人称视觉数据,构建了T2FPV-ETH数据集。在该设置下,由于检测与跟踪不完善、遮挡以及相机视野限制,会产生第一人称视角特定误差。为处理这些误差,我们提出CoFE模块,该模块通过轨迹预测算法以端到端方式进一步优化缺失数据的插补。我们的方法能够降低此类第一人称视角误差对下游预测性能的影响,使位移误差平均降低超过10%。为促进研究参与,我们公开了T2FPV-ETH数据集及配套软件工具。