Accurate shape and trajectory estimation of dynamic objects is essential for reliable automated driving. Classical Bayesian extended-object models offer theoretical robustness and efficiency but depend on completeness of a-priori and update-likelihood functions, while deep learning methods bring adaptability at the cost of dense annotations and high compute. We bridge these strengths with LEO (Learned Extension of Objects), a spatio-temporal Graph Attention Network that fuses multi-modal production-grade sensor tracks to learn adaptive fusion weights, ensure temporal consistency, and represent multi-scale shapes. Using a task-specific parallelogram ground-truth formulation, LEO models complex geometries (e.g. articulated trucks and trailers) and generalizes across sensor types, configurations, object classes, and regions, remaining robust for challenging and long-range targets. Evaluations on the Mercedes-Benz DRIVE PILOT SAE L3 dataset demonstrate real-time computational efficiency suitable for production systems; additional validation on public datasets such as View of Delft (VoD) further confirms cross-dataset generalization.
翻译:动态目标的精确形状与轨迹估计对于可靠自动驾驶至关重要。经典贝叶斯扩展目标模型在理论上具有鲁棒性和高效性,但依赖先验函数和更新似然函数的完备性;而深度学习方法虽具备自适应性,却以密集标注和高计算开销为代价。我们提出的LEO(对象学习扩展)融合了这两类方法的优势,这是一种时空图注意力网络,通过融合多模态量产级传感器轨迹,学习自适应融合权重、确保时序一致性并表征多尺度形状。基于任务特定的平行四边形真值建模,LEO能够建模复杂几何形状(如铰接式卡车和挂车),并泛化至不同传感器类型、配置、目标类别及区域,对具有挑战性的远距离目标保持鲁棒性。在梅赛德斯-奔驰DRIVE PILOT SAE L3级数据集上的评估表明,其具备适用于量产系统的实时计算效率;在View of Delft (VoD)等公开数据集上的额外验证进一步确认了跨数据集泛化能力。