Human trajectory prediction is typically posed as a zero-shot generalization problem: a predictor is learnt on a dataset of human motion in training scenes, and then deployed on unseen test scenes. While this paradigm has yielded tremendous progress, it fundamentally assumes that trends in human behavior within the deployment scene are constant over time. As such, current prediction models are unable to adapt to scene-specific transient human behaviors, such as crowds temporarily gathering to see buskers, pedestrians hurrying through the rain and avoiding puddles, or a protest breaking out. We formalize the problem of scene-specific adaptive trajectory prediction and propose a new adaptation approach inspired by prompt tuning called latent corridors. By augmenting the input of any pre-trained human trajectory predictor with learnable image prompts, the predictor can improve in the deployment scene by inferring trends from extremely small amounts of new data (e.g., 2 humans observed for 30 seconds). With less than 0.1% additional model parameters, we see up to 23.9% ADE improvement in MOTSynth simulated data and 16.4% ADE in MOT and Wildtrack real pedestrian data. Qualitatively, we observe that latent corridors imbue predictors with an awareness of scene geometry and scene-specific human behaviors that non-adaptive predictors struggle to capture. The project website can be found at https://neerja.me/atp_latent_corridors/.
翻译:人类轨迹预测通常被设定为一种零样本泛化问题:基于训练场景中人类运动的数据集学习预测器,随后将其部署于未见过的测试场景中。尽管这一范式取得了显著进展,但其根本假设是部署场景中人类行为趋势随时间恒定不变。因此,当前预测模型无法适应场景特定的瞬时人类行为,例如人群临时聚集围观街头艺人、行人冒雨赶路并避开积水,或突发抗议活动。我们正式定义了场景自适应轨迹预测问题,并提出一种受提示调优启发的新颖自适应方法——潜通道。通过为任意预训练人类轨迹预测器的输入添加可学习图像提示,预测器可利用极少量新数据(例如,观测30秒内的2个人)推断行为趋势,从而在部署场景中提升性能。在仅增加不到0.1%模型参数的情况下,我们在MOTSynth模拟数据上实现了高达23.9%的平均位移误差(ADE)改进,在MOT和Wildtrack真实行人数据上实现了16.4%的ADE改进。定性分析表明,潜通道使预测器获得了非自适应预测器难以捕捉的场景几何感知能力和场景特定人类行为意识。项目网站详见https://neerja.me/atp_latent_corridors/。