Trajectory prediction in autonomous driving has traditionally been studied from a model-centric perspective. However, existing datasets exhibit a strong long-tail distribution in scenario density, where common low-density cases dominate and safety-critical high-density cases are severely underrepresented. This imbalance limits model robustness and hides failure modes when standard evaluations average errors across all scenarios. We revisit trajectory prediction from a data-centric perspective and present Den-TP, a framework for density-aware dataset curation and evaluation. Den-TP first partitions data into density-conditioned regions using agent count as a dataset-agnostic proxy for interaction complexity. It then applies a gradient-based submodular selection objective to choose representative samples within each region while explicitly rebalancing across densities. The resulting subset reduces the dataset size by 50\% yet preserves overall performance and significantly improves robustness in high-density scenarios. We further introduce density-conditioned evaluation protocols that reveal long-tail failure modes overlooked by conventional metrics. Experiments on Argoverse 1 and 2 with state-of-the-art models show that robust trajectory prediction depends not only on data scale, but also on balancing scenario density.
翻译:自动驾驶中的轨迹预测传统上从模型中心的角度进行研究。然而,现有数据集在场景密度上表现出显著的长尾分布,其中常见的低密度场景占主导地位,而安全关键的高密度场景则严重缺乏代表性。这种不平衡限制了模型的鲁棒性,并在标准评估对所有场景求平均误差时掩盖了失效模式。我们从数据中心的视角重新审视轨迹预测,并提出了Den-TP,一个面向密度感知数据集筛选与评估的框架。Den-TP首先利用智能体数量作为与数据集无关的交互复杂度代理,将数据划分为密度条件区域。接着,它应用基于梯度的次模选择目标,在每个区域内选取具有代表性的样本,同时显式地跨密度区域进行重平衡。所得子集将数据集大小减少50%,但在保持整体性能的同时,显著提升了高密度场景下的鲁棒性。我们进一步引入了密度条件评估协议,揭示了被传统指标忽略的长尾失效模式。在Argoverse 1和2数据集上使用最先进模型的实验表明,稳健的轨迹预测不仅依赖于数据规模,也依赖于场景密度的均衡。