Generative Texture Diversification of 3D Pedestrians for Robust Autonomous Driving Perception

In recent years, autonomous driving has significantly in creased the demand for high-quality data to train 2D and 3D perception models for safety-critical scenarios. Real world datasets struggle to meet this demand as require ments continuously evolve and large-scale annotated data collection remains costly and time-consuming making syn thetic data a scalable, practical and controllable alterna tive. Pedestrian detection is among the most safety-critical tasks in autonomous driving. In this paper, we propose a simple yet effective method for scaling variability in 3D pedestrian assets for synthetic scene generation. Starting from a single 3D base asset, we generate multiple distinct pedestrian instances by synthesizing diverse facial textures and identity-level appearance variations using StyleGAN2 and automatically mapping them onto 3D meshes. This ap proach enables scalable appearance-level asset diversifica tion without requiring the design of new geometries for each instance. Using the assets, we construct synthetic datasets and study the impact of mixing real and synthetic data for RGB-based object detection. Through complementary ex periments, we analyze geometry-driven distribution shifts in point cloud perception for 3D object detection. Our findings demonstrate that controlled synthetic diversifica tion improves robustness in 2D detection while revealing the sensitivity of 3D perception models to geometric domain gaps. Overall, this work highlights how generative AI en ables scalable, simulation-ready pedestrian diversification through controlled facial texture synthesis, along with the benefits and limitations of cross-domain training strategies in autonomous driving pipelines.

翻译：近年来，自动驾驶对高质量数据的需求显著增长，以训练安全关键场景下的2D和3D感知模型。真实世界数据集难以满足这一需求，因为要求持续演变，且大规模标注数据收集仍成本高昂且耗时，这使得合成数据成为一种可扩展、实用且可控的替代方案。行人检测是自动驾驶中最安全关键的任务之一。本文提出一种简单而有效的方法，用于增强3D行人资产在合成场景生成中的变异性。从单个3D基础资产出发，我们通过使用StyleGAN2合成多样化的面部纹理和身份级别的外观变化，并将其自动映射到3D网格上，生成多个不同的行人实例。该方法无需为每个实例设计新几何形状即可实现可扩展的外观级资产多样化。利用这些资产，我们构建合成数据集，并研究混合真实与合成数据对基于RGB目标检测的影响。通过补充实验，我们分析了3D目标检测中点云感知中几何驱动的分布偏移。我们的发现表明，受控的合成多样化提高了2D检测的鲁棒性，同时揭示了3D感知模型对几何域差异的敏感性。总体而言，这项工作突出了生成式AI如何通过可控的面部纹理合成实现可扩展的、可供仿真使用的行人多样化，以及跨域训练策略在自动驾驶流程中的优势与局限性。