The rapid aging of societies is intensifying demand for autonomous care robots; however, most existing systems are task-specific and rely on handcrafted preprocessing, limiting their ability to generalize across diverse scenarios. A prevailing theory in cognitive neuroscience proposes that the human brain operates through hierarchical predictive processing, which underlies flexible cognition and behavior by integrating multimodal sensory signals. Inspired by this principle, we introduce a hierarchical multimodal recurrent neural network grounded in predictive processing under the free-energy principle, capable of directly integrating over 30,000-dimensional visuo-proprioceptive inputs without dimensionality reduction. The model was able to learn two representative caregiving tasks, rigid-body repositioning and flexible-towel wiping, without task-specific feature engineering. We demonstrate three key properties: (i) self-organization of hierarchical latent dynamics that regulate task transitions, capture variability in uncertainty, and infer occluded states; (ii) robustness to degraded vision through visuo-proprioceptive integration; and (iii) asymmetric interference in multitask learning, where the more variable wiping task had little influence on repositioning, whereas learning the repositioning task led to a modest reduction in wiping performance, while the model maintained overall robustness. Although the evaluation was limited to simulation, these results establish predictive processing as a universal and scalable computational principle, pointing toward robust, flexible, and autonomous caregiving robots while offering theoretical insight into the human brain's ability to achieve flexible adaptation in uncertain real-world environments.
翻译:社会快速老龄化加剧了对自主照护机器人的需求;然而,现有系统大多针对特定任务,且依赖于手工预处理,限制了其在多样化场景中的泛化能力。认知神经科学中的一个主流理论提出,人脑通过层级预测处理运作,该机制通过整合多模态感觉信号支撑灵活的认知与行为。受此原理启发,我们基于自由能原理下的预测处理,引入一种层级多模态循环神经网络,能够直接整合超过30,000维的视觉-本体感觉输入而无需降维。该模型无需针对特定任务的特征工程,即可学习两种代表性照护任务:刚体重定位和柔性毛巾擦拭。我们展示了三个关键特性:(i)层级潜在动态的自组织,可调控任务转换、捕捉不确定性的变化并推断被遮挡状态;(ii)通过视觉-本体感觉整合实现对视觉退化的鲁棒性;(iii)多任务学习中的不对称干扰,其中变化较大的擦拭任务对重定位影响甚微,而学习重定位任务则导致擦拭性能略有下降,同时模型整体保持鲁棒性。尽管评估仅限于仿真环境,这些结果确立了预测处理作为一种通用且可扩展的计算原理,为构建鲁棒、灵活且自主的照护机器人指明了方向,同时为理解人脑在不确定现实环境中实现灵活适应的能力提供了理论洞见。