Forecasting motion and spatial positions of objects is of fundamental importance, especially in safety-critical settings such as autonomous driving. In this work, we address the issue by forecasting two different modalities that carry complementary information, namely optical flow and depth. To this end we propose FLODCAST a flow and depth forecasting model that leverages a multitask recurrent architecture, trained to jointly forecast both modalities at once. We stress the importance of training using flows and depth maps together, demonstrating that both tasks improve when the model is informed of the other modality. We train the proposed model to also perform predictions for several timesteps in the future. This provides better supervision and leads to more precise predictions, retaining the capability of the model to yield outputs autoregressively for any future time horizon. We test our model on the challenging Cityscapes dataset, obtaining state of the art results for both flow and depth forecasting. Thanks to the high quality of the generated flows, we also report benefits on the downstream task of segmentation forecasting, injecting our predictions in a flow-based mask-warping framework.
翻译:运动物体及其空间位置的预测具有根本重要性,尤其在自动驾驶等安全关键场景中。本文通过预测光流和深度这两种携带互补信息的不同模态来解决该问题。为此,我们提出FLODCAST——一种光流与深度预测模型,该模型采用多任务循环架构,训练后能同时联合预测两种模态。我们强调联合使用光流与深度图进行训练的重要性,实验表明当模型获知另一模态信息时两个任务性能均得到提升。我们训练该模型同时实现未来多个时间步的预测,这提供了更好的监督信号并产生更精确的预测,同时保持模型能够自回归地输出任意未来时间范围结果。我们在具有挑战性的Cityscapes数据集上测试模型,获得了光流与深度预测的最新成果。得益于生成的高质量光流,我们还报告了在下游分割预测任务中的性能提升——将我们的预测注入基于光流的掩膜翘曲框架中。