Advancements in foundation models have catalyzed research in Embodied AI to develop interactive agents capable of environmental reasoning and interaction. Developing such agents requires diverse, large-scale datasets. Prior frameworks generate synthetic data for long-term human-robot interactions but fail to model the bidirectional influence between human behavior and household environments. Our proposed generative framework creates household datasets at scale through loosely coupled generation of long-term human-robot interactions and environments. Human personas influence environment generation, while environment schematics and semantics shape human-robot interactions. The generated 3D data includes rich static context such as object and environment semantics, and temporal context capturing human and agent behaviors over extended periods. Our flexible tool allows users to define dataset characteristics via natural language prompts, enabling configuration of environment and human activity data through natural language specifications. The tool creates variations of user-defined configurations, enabling scalable data generation. We validate our framework through statistical evaluation using multi-modal embeddings and key metrics: cosine similarity, mutual information gain, intervention analysis, and iterative improvement validation. Statistical comparisons show good alignment with real-world datasets (HOMER) with cosine similarity (0.60), while synthetic datasets (Wang et al.) show moderate alignment (0.27). Intervention analysis across age, organization, and sleep pattern changes shows statistically significant effects (p < 0.001) with large effect sizes (Cohen's d = 0.51-1.12), confirming bidirectional coupling translates persona traits into measurable environmental and behavioral differences. These contributions enable development and testing of household smart devices at scale.
翻译:基础模型的进步推动了具身人工智能的研究,旨在开发能够进行环境推理与交互的智能体。开发此类智能体需要多样化、大规模的数据集。现有框架可为长期人机交互生成合成数据,但未能建模人类行为与家庭环境之间的双向影响。我们提出的生成框架通过松散耦合地生成长期人机交互与环境,实现大规模家庭数据集的创建。人类角色特征影响环境生成,而环境布局与语义则塑造人机交互行为。生成的3D数据包含丰富的静态上下文(如物体与环境语义)以及捕捉人类与智能体长期行为的时序上下文。我们的灵活工具允许用户通过自然语言提示定义数据集特征,能够通过自然语言规范配置环境与人类活动数据。该工具可生成用户定义配置的多种变体,实现可扩展的数据生成。我们通过多模态嵌入和关键指标进行统计验证:余弦相似度、互信息增益、干预分析和迭代改进验证。统计比较显示与真实世界数据集(HOMER)具有良好对齐性(余弦相似度0.60),而合成数据集(Wang等人)呈现中等对齐性(0.27)。针对年龄、组织模式和睡眠模式变化的干预分析显示出统计学显著效应(p < 0.001)及大效应量(Cohen's d = 0.51-1.12),证实双向耦合能将角色特征转化为可测量的环境与行为差异。这些贡献为大规模开发和测试家庭智能设备提供了支持。