Generating realistic human-object interaction (HOI) animations remains challenging because it requires jointly modeling dynamic human actions and diverse object geometries. Prior diffusion-based approaches often rely on hand-crafted contact priors or human-imposed kinematic constraints to improve contact quality. We propose LIGHT, a data-driven alternative in which guidance emerges from the denoising pace itself, reducing dependence on manually designed priors. Building on diffusion forcing, we factor the representation into modality-specific components and assign individualized noise levels with asynchronous denoising schedules. In this paradigm, cleaner components guide noisier ones through cross-attention, yielding guidance without auxiliary classifiers. We find that this data-driven guidance is inherently contact-aware, and can be enhanced when training is augmented with a broad spectrum of synthetic object geometries, encouraging invariance of contact semantics to geometric diversity. Extensive experiments show that pace-induced guidance more effectively mirrors the benefits of contact priors than conventional classifier-free guidance, while achieving higher contact fidelity, more realistic HOI generation, and stronger generalization to unseen objects and tasks.
翻译:生成逼真的人类-物体交互动画仍具挑战性,因其需要联合建模动态人体动作与多样化的物体几何结构。现有基于扩散的方法通常依赖手工设计的接触先验或人为施加的运动学约束来提升接触质量。我们提出LIGHT这一数据驱动替代方案,其中引导信号源自去噪过程本身的速度差异,从而降低对人工设计先验的依赖。基于扩散强制机制,我们将表征分解为模态特定组件,并为各组件赋予异步去噪调度下的独立噪声水平。在此范式下,较干净的组件通过交叉注意力引导较噪声组件,无需辅助分类器即可实现引导。我们发现这种数据驱动引导天然具有接触感知特性,当训练中注入大规模合成物体几何数据增强时,该特性可进一步提升——这促使接触语义对几何多样性保持不变性。大量实验表明,速度诱导的引导比传统无分类器引导更有效地复现接触先验的优势,同时实现更高的接触保真度、更真实的HOI生成以及更强的未见物体与任务泛化能力。