Human intelligence and human consciousness emerge gradually during the process of cognitive development. Understanding this development is an essential aspect of understanding the human mind and may facilitate the construction of artificial minds with similar properties. Importantly, human cognitive development relies on embodied interactions with the physical and social environment, which is perceived via complementary sensory modalities. These interactions allow the developing mind to probe the causal structure of the world. This is in stark contrast to common machine learning approaches, e.g., for large language models, which are merely passively ``digesting'' large amounts of training data, but are not in control of their sensory inputs. However, computational modeling of the kind of self-determined embodied interactions that lead to human intelligence and consciousness is a formidable challenge. Here we present MIMo, an open-source multi-modal infant model for studying early cognitive development through computer simulations. MIMo's body is modeled after an 18-month-old child with detailed five-fingered hands. MIMo perceives its surroundings via binocular vision, a vestibular system, proprioception, and touch perception through a full-body virtual skin, while two different actuation models allow control of his body. We describe the design and interfaces of MIMo and provide examples illustrating its use. All code is available at https://github.com/trieschlab/MIMo .
翻译:人类智能和意识在认知发展过程中逐渐涌现。理解这一发展是理解人类心智的关键方面,也可能有助于构建具有类似特性的人工心智。重要的是,人类认知发展依赖于与物理和社会环境的具身交互,这些交互通过互补的感觉模态被感知。这种交互使得发展中的心智能够探知世界的因果结构。这与常见的机器学习方法(例如大语言模型)形成鲜明对比——后者只是被动地“消化”大量训练数据,却无法控制其感官输入。然而,对导致人类智能和意识的具身自决交互进行计算建模是一项艰巨的挑战。本文提出MIMo,一个用于通过计算机模拟研究早期认知发展的开源多模态婴儿模型。MIMo的身体以18个月大的幼儿为模型,具有精细的五指手。MIMo通过双目视觉、前庭系统、本体感觉和全身虚拟皮肤的触觉感知来感知周围环境,同时两种不同的驱动模型可控制其身体运动。我们描述了MIMo的设计与接口,并提供了说明其应用的示例。所有代码均可在https://github.com/trieschlab/MIMo获取。