Long-Horizon (LH) tasks in Human-Scene Interaction (HSI) are complex multi-step tasks that require continuous planning, sequential decision-making, and extended execution across domains to achieve the final goal. However, existing methods heavily rely on skill chaining by concatenating pre-trained subtasks, with environment observations and self-state tightly coupled, lacking the ability to generalize to new combinations of environments and skills, failing to complete various LH tasks across domains. To solve this problem, this paper presents ALAS, a cross-domain learning framework for LH tasks via biologically inspired dual-stream disentanglement. Inspired by the brain's "where-what" dual pathway mechanism, ALAS comprises two core modules: i) an environment learning module for spatial understanding, which captures object functions, spatial relationships, and scene semantics, achieving cross-domain transfer through complete environment-self disentanglement; ii) a skill learning module for task execution, which processes self-state information including joint degrees of freedom and motor patterns, enabling cross-skill transfer through independent motor pattern encoding. We conducted extensive experiments on various LH tasks in HSI scenes. Compared with existing methods, ALAS can achieve an average subtasks success rate improvement of 23\% and average execution efficiency improvement of 29\%.
翻译:长视域(Long-Horizon, LH)任务是人-场景交互(Human-Scene Interaction, HSI)中复杂的多步骤任务,需要跨领域的持续规划、顺序决策和扩展执行以实现最终目标。然而,现有方法严重依赖通过拼接预训练子任务进行技能链式组合,且环境观测与自身状态紧密耦合,缺乏对环境和技能新组合的泛化能力,无法完成跨领域的多种LH任务。为解决该问题,本文提出ALAS——一种基于生物启发的双流解缠跨领域LH任务学习框架。受大脑"何处-何物"(where-what)双路径机制启发,ALAS包含两个核心模块:i)用于空间理解的环境学习模块,捕捉物体功能、空间关系和场景语义,通过完整的环境-自身解缠实现跨领域迁移;ii)用于任务执行的技能学习模块,处理包括关节自由度和运动模式在内的自身状态信息,通过独立的运动模式编码实现跨技能迁移。我们在HSI场景的多种LH任务上进行了广泛实验。与现有方法相比,ALAS的子任务平均成功率提升23%,平均执行效率提升29%。