Well-designed indoor scenes should prioritize how people can act within a space rather than merely what objects to place. However, existing 3D scene generation methods emphasize visual and semantic plausibility, while insufficiently addressing whether people can comfortably walk, sit, or manipulate objects. To bridge this gap, we present a Behavior-Aware Anthropometric Scene Generation framework. Our approach leverages vision-language models (VLMs) to analyze object-behavior relationships, translating spatial requirements into parametric layout constraints adapted to user-specific anthropometric data. We conducted comparative studies with state-of-the-art models using geometric metrics and a user perception study (N=16). We further conducted in-depth human-scale studies (individuals, N=20; groups, N=18). The results showed improvements in task completion time, trajectory efficiency, and human-object manipulation space. This study contributes a framework that bridges VLM-based interaction reasoning with anthropometric constraints, validated through both technical metrics and real-scale human usability studies.
翻译:精心设计的室内场景应优先考虑人在空间中的行为方式,而非仅仅关注物品的摆放。然而,现有的三维场景生成方法侧重于视觉与语义的合理性,却未能充分评估人们能否在其中舒适地行走、就坐或操作物品。为弥补这一不足,我们提出了一种行为感知人体测量场景生成框架。该方法利用视觉-语言模型分析物体与行为之间的关系,将空间需求转化为适应特定用户人体测量数据的参数化布局约束。我们通过几何指标和一项用户感知研究(N=16)与最先进的模型进行了比较研究。此外,我们还开展了深入的人体尺度研究(个体,N=20;群体,N=18)。结果表明,该方法在任务完成时间、轨迹效率以及人-物操作空间方面均有改善。本研究提出了一个将基于视觉-语言模型的交互推理与人体测量约束相结合的框架,并通过技术指标和真实尺度的人类可用性研究进行了验证。