The future instance prediction from a Bird's Eye View(BEV) perspective is a vital component in autonomous driving, which involves future instance segmentation and instance motion prediction. Existing methods usually rely on a redundant and complex pipeline which requires multiple auxiliary outputs and post-processing procedures. Moreover, estimated errors on each of the auxiliary predictions will lead to degradation of the prediction performance. In this paper, we propose a simple yet effective fully end-to-end framework named Future Instance Prediction Transformer(FipTR), which views the task as BEV instance segmentation and prediction for future frames. We propose to adopt instance queries representing specific traffic participants to directly estimate the corresponding future occupied masks, and thus get rid of complex post-processing procedures. Besides, we devise a flow-aware BEV predictor for future BEV feature prediction composed of a flow-aware deformable attention that takes backward flow guiding the offset sampling. A novel future instance matching strategy is also proposed to further improve the temporal coherence. Extensive experiments demonstrate the superiority of FipTR and its effectiveness under different temporal BEV encoders.
翻译:鸟瞰图视角下的未来实例预测是自动驾驶中的关键组成部分,涉及未来实例分割与实例运动预测。现有方法通常依赖冗余复杂的流水线,需要多种辅助输出和后处理步骤。此外,每个辅助预测上的估计误差将导致预测性能下降。本文提出一种简单而有效的全端到端框架——未来实例预测Transformer(FipTR),将任务视为对将来帧的BEV实例分割与预测。我们提出采用代表特定交通参与者的实例查询直接估计相应的未来占用掩码,从而摆脱复杂的后处理流程。此外,我们设计了一个流感知BEV预测器用于未来BEV特征预测,该预测器包含流感知可变形注意力,利用后向流引导偏移采样。同时提出了一种新颖的未来实例匹配策略以进一步提升时间连贯性。大量实验证明了FipTR的优越性及其在不同时间BEV编码器下的有效性。