Comprehensive Machine Learning Benchmarking for Fringe Projection Profilometry with Photorealistic Synthetic Data

Machine learning approaches for fringe projection profilometry (FPP) are hindered by the lack of large, diverse datasets and standardized benchmarking protocols. This paper introduces the first open-source, photorealistic synthetic dataset for FPP, generated using NVIDIA Isaac Sim, comprising 15,600 fringe images and 300 depth reconstructions across 50 objects. We apply this dataset to single-shot FPP, where models predict 3D depth maps directly from individual fringe images without temporal phase shifting. Through systematic ablation studies, we identify optimal learning configurations for long-range (1.5-2.1 m) depth prediction. We compare three depth normalization strategies and show that individual normalization, which decouples object shape from absolute scale, yields a 9.1x improvement in object reconstruction accuracy over raw depth. We further show that removing background fringe patterns severely degrades performance across all normalizations, demonstrating that background fringes provide essential spatial phase reference rather than noise. We evaluate six loss functions and identify Hybrid L1 loss as optimal. Using the best configuration, we benchmark four architectures and find UNet achieves the strongest performance, though errors remain far above the sub-millimeter accuracy of classical FPP. The small performance gap between architectures indicates that the dominant limitation is information deficit rather than model design: single fringe images lack sufficient information for accurate depth recovery without explicit phase cues. This work provides a standardized benchmark and evidence motivating hybrid approaches combining phase-based FPP with learned refinement. The dataset is available at https://huggingface.co/datasets/aharoon/fpp-ml-bench and code at https://github.com/AnushLak/fpp-ml-bench.

翻译：条纹投影轮廓测量（FPP）的机器学习方法因缺乏大规模多样化数据集和标准化基准测试协议而受到阻碍。本文首次引入了面向FPP的开源逼真合成数据集，该数据集使用NVIDIA Isaac Sim生成，包含50个物体的15,600幅条纹图像和300个深度重建结果。我们将该数据集应用于单次拍摄FPP任务，其中模型直接从单幅条纹图像预测三维深度图，无需时间相位偏移。通过系统化的消融研究，我们确定了适用于长距离（1.5-2.1米）深度预测的最优学习配置。我们比较了三种深度归一化策略，结果表明：将物体形状与绝对尺度解耦的个体归一化方法，在物体重建精度上相比原始深度数据实现了9.1倍的提升。我们进一步证明，移除背景条纹图案会严重降低所有归一化策略下的性能，这表明背景条纹提供了关键的空间相位参考而非噪声。我们评估了六种损失函数，确定混合L1损失为最优选择。采用最佳配置后，我们对四种网络架构进行基准测试，发现UNet取得了最优性能，但其误差仍远高于传统FPP的亚毫米级精度。不同架构间的性能差距较小，表明主要限制在于信息缺失而非模型设计：单幅条纹图像在缺乏显式相位信息的情况下，无法提供足够的深度恢复信息。本研究提供了标准化基准和实证依据，为结合相位法FPP与学习式优化的混合方法提供了发展动力。数据集发布于https://huggingface.co/datasets/aharoon/fpp-ml-bench，代码开源在https://github.com/AnushLak/fpp-ml-bench。