The systematic evaluation and understanding of computer vision models under varying conditions require large amounts of data with comprehensive and customized labels, which real-world vision datasets rarely satisfy. While current synthetic data generators offer a promising alternative, particularly for embodied AI tasks, they often fall short for computer vision tasks due to low asset and rendering quality, limited diversity, and unrealistic physical properties. We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models, based on the newly developed embodied AI benchmark, BEHAVIOR-1K. BVS supports a large number of adjustable parameters at the scene level (e.g., lighting, object placement), the object level (e.g., joint configuration, attributes such as "filled" and "folded"), and the camera level (e.g., field of view, focal length). Researchers can arbitrarily vary these parameters during data generation to perform controlled experiments. We showcase three example application scenarios: systematically evaluating the robustness of models across different continuous axes of domain shift, evaluating scene understanding models on the same set of images, and training and evaluating simulation-to-real transfer for a novel vision task: unary and binary state prediction. Project website: https://behavior-vision-suite.github.io/
翻译:计算机视觉模型在不同条件下的系统评估与理解需要大量具备全面且定制化标签的数据,而真实世界的视觉数据集通常难以满足这一需求。尽管当前合成数据生成器(尤其针对具身AI任务)提供了有前景的替代方案,但由于资产与渲染质量较低、多样性受限以及物理特性不够逼真,这类生成器在计算机视觉任务中往往表现不足。本文基于新开发的具身AI基准测试BEHAVIOR-1K,提出BEHAVIOR视觉套件(BVS,BEHAVIOR Vision Suite)——一套用于生成完全定制化合成数据的工具与资产,以实现计算机视觉模型的系统评估。BVS在场景层面(如光照、物体摆放)、物体层面(如关节配置、“填充”“折叠”等属性)以及相机层面(如视场角、焦距)支持大量可调参数。研究人员可在数据生成过程中任意调整这些参数以开展受控实验。我们展示了三个示例应用场景:系统评估模型在不同连续域迁移轴上的鲁棒性、基于同一组图像评估场景理解模型,以及针对一项新型视觉任务(一元与二元状态预测)训练并评估仿真到现实的迁移能力。项目网站:https://behavior-vision-suite.github.io/