In-betweening is a technique for generating transitions given initial and target character states. The majority of existing works require multiple (often $>$10) frames as input, which are not always accessible. Our work deals with a focused yet challenging problem: to generate the transition when given exactly two frames (only the first and last). To cope with this challenging scenario, we implement our bi-directional scheme which generates forward and backward transitions from the start and end frames with two adversarial autoregressive networks, and stitches them in the middle of the transition where there is no strict ground truth. The autoregressive networks based on conditional variational autoencoders (CVAE) are optimized by searching for a pair of optimal latent codes that minimize a novel stitching loss between their outputs. Results show that our method achieves higher motion quality and more diverse results than existing methods on both the LaFAN1 and Human3.6m datasets.
翻译:插帧是一种在给定初始与目标角色状态时生成过渡运动的技术。现有研究大多需要多帧(通常超过10帧)作为输入,而这些帧并非总能获取。我们的工作聚焦于一个具有挑战性的特例问题:在仅提供两帧(即首帧和尾帧)时生成过渡运动。为应对这一挑战,我们采用双向方案,通过两个对抗自回归网络从起始帧和结束帧分别生成正向与反向过渡,并在严格无真实参考的过渡中点处进行拼接。这些基于条件变分自编码器(CVAE)的自回归网络通过搜索一对最优潜码进行优化,该潜码可最小化输出之间的新颖拼接损失。实验结果表明,在LaFAN1和Human3.6m数据集上,我们的方法在运动质量和结果多样性方面均优于现有方法。