Motion forecasting transforms sequences of past movements and environment context into future motion. Recent methods rely on learned representations, resulting in hidden states that are difficult to interpret. In this work, we use natural language to quantize motion features in a human-interpretable way, and measure the degree to which they are embedded in hidden states. Our experiments reveal that hidden states of motion sequences are arranged with respect to our discrete sets of motion features. Following these insights, we fit control vectors to motion features, which allow for controlling motion forecasts at inference. Consequently, our method enables controlling transformer-based motion forecasting models with textual inputs, providing a unique interface to interact with and understand these models. Our implementation is available at https://github.com/kit-mrt/future-motion
翻译:运动预测将过去的运动序列与环境上下文转化为未来运动。现有方法依赖于学习到的表征,导致隐藏状态难以解释。本研究采用自然语言以人类可解释的方式量化运动特征,并测量这些特征在隐藏状态中的嵌入程度。实验表明,运动序列的隐藏状态按照离散运动特征集进行组织。基于此发现,我们为运动特征拟合控制向量,实现在推理阶段对运动预测的控制。因此,本方法能够通过文本输入控制基于Transformer的运动预测模型,为交互和理解此类模型提供了独特接口。代码实现发布于https://github.com/kit-mrt/future-motion