Besides interacting correctly with other vehicles, automated vehicles should also be able to react in a safe manner to vulnerable road users like pedestrians or cyclists. For a safe interaction between pedestrians and automated vehicles, the vehicle must be able to interpret the pedestrian's behavior. Common environment models do not contain information like body poses used to understand the pedestrian's intent. In this work, we propose an environment model that includes the position of the pedestrians as well as their pose information. We only use images from a monocular camera and the vehicle's localization data as input to our pedestrian environment model. We extract the skeletal information with a neural network human pose estimator from the image. Furthermore, we track the skeletons with a simple tracking algorithm based on the Hungarian algorithm and an ego-motion compensation. To obtain the 3D information of the position, we aggregate the data from consecutive frames in conjunction with the vehicle position. We demonstrate our pedestrian environment model on data generated with the CARLA simulator and the nuScenes dataset. Overall, we reach a relative position error of around 16% on both datasets.
翻译:除了与其他车辆正确交互外,自动驾驶车辆还应能够以安全方式应对行人或骑行者等弱势道路使用者。为实现行人与自动驾驶车辆的安全交互,车辆必须能够解读行人的行为。常规环境模型不包含用于理解行人意图的身体姿态等信息。本文提出了一种包含行人位置及其姿态信息的环境模型。我们仅以单目相机图像和车辆定位数据作为行人环境模型的输入,通过神经网络人体姿态估计器从图像中提取骨骼信息。随后,采用基于匈牙利算法和自运动补偿的简单跟踪算法对骨骼进行追踪。为获取位置的三维信息,我们结合车辆位置对连续帧数据进行聚合。我们通过CARLA模拟器和nuScenes数据集生成的数据验证了行人环境模型,在两个数据集上的相对位置误差均达到约16%。