How to generate diverse, life-like, and unlimited long head/body sequences without any driving source? We argue that this under-investigated research problem is non-trivial at all, and has unique technical challenges behind it. Without semantic constraints from the driving sources, using the standard autoregressive model to generate infinitely long sequences would easily result in 1) out-of-distribution (OOD) issue due to the accumulated error, 2) insufficient diversity to produce natural and life-like motion sequences and 3) undesired periodic patterns along the time. To tackle the above challenges, we propose a systematic framework that marries the benefits of VQ-VAE and a novel token-level control policy trained with reinforcement learning using carefully designed reward functions. A high-level prior model can be easily injected on top to generate unlimited long and diverse sequences. Although we focus on no driving sources now, our framework can be generalized for controlled synthesis with explicit driving sources. Through comprehensive evaluations, we conclude that our proposed framework can address all the above-mentioned challenges and outperform other strong baselines very significantly.
翻译:如何在没有驱动源的情况下生成多样化、逼真且无限长度的头部/身体序列?我们认为这一研究尚不充分的课题具有显著的技术挑战性。由于缺乏驱动源的语义约束,使用标准自回归模型生成无限长序列容易导致:1) 累积误差引发的分布外(OOD)问题;2) 多样性不足,难以生成自然逼真的动作序列;3) 沿时间轴产生不期望的周期性模式。为应对上述挑战,我们提出系统性框架,融合VQ-VAE的优势与基于强化学习的新型令牌级控制策略(通过精心设计的奖励函数训练)。高层先验模型可便捷地嵌入该框架,实现无限长多样性序列生成。尽管当前聚焦于无驱动源场景,本框架可推广至具有显式驱动源的可控合成。通过全面评估,证明所提框架能有效解决上述所有挑战,其性能显著超越其他强基线模型。