Reliable depth estimation under real optical conditions remains a core challenge for camera vision in systems such as autonomous robotics and augmented reality. Despite recent progress in depth estimation and depth-of-field rendering, research remains constrained by the lack of large-scale, high-fidelity, real stereo DSLR datasets, limiting real-world generalization and evaluation of models trained on synthetic data as shown extensively in literature. We present the first high-resolution (5472$\times$3648px) stereo DSLR dataset with 18000 images, systematically varying focal length and aperture across complex real scenes and capturing the optical realism and complexity of professional camera systems. For 9 scenes with varying scene complexity, lighting and background, images are captured with two identical camera assemblies at 10 focal lengths (28-70mm) and 5 apertures (f/2.8-f/22), spanning 50 optical configurations in 2000 images per scene. This full-range optics coverage enables controlled analysis of geometric and optical effects for monocular and stereo depth estimation, shallow depth-of-field rendering, deblurring, 3D scene reconstruction and novel view synthesis. Each focal configuration has a dedicated calibration image set, supporting evaluation of classical and learning based methods for intrinsic and extrinsic calibration. The dataset features challenging visual elements such as multi-scale optical illusions, reflective surfaces, mirrors, transparent glass walls, fine-grained details, and natural / artificial ambient light variations. This work attempts to bridge the realism gap between synthetic training data and real camera optics, and demonstrates challenges with the current state-of-the-art monocular, stereo depth and depth-of-field methods. We release the dataset, calibration files, and evaluation code to support reproducible research on real-world optical generalization.
翻译:摘要:在实际光学条件下实现可靠的深度估计,仍是自主机器人及增强现实等系统中摄像机视觉的核心挑战。尽管近年来深度估计与景深渲染技术取得进展,但受限于缺乏大规模、高保真度的真实立体单反相机数据集,现有研究难以验证基于合成数据训练的模型在真实场景中的泛化能力(文献已广泛证实此局限)。我们提出首个高分辨率(5472×3648像素)立体单反相机数据集,包含18000张图像,系统性改变复杂真实场景中的焦距与光圈参数,捕捉专业相机系统的光学真实性与复杂性。覆盖9个场景(场景复杂度、光照及背景各异),通过两套相同相机组件在不同焦距(28-70mm共10档)与光圈(f/2.8-f/22共5档)下采集图像,每个场景涵盖50种光学配置(每场景2000张图像)。这种全域光学覆盖支持对单目/立体深度估计、浅景深渲染、去模糊、三维场景重建及新视角合成中的几何与光学效应进行可控分析。每种焦距配置均配有专用标定图像集,支持对经典及基于学习的标定方法(内参/外参)进行评估。数据集包含多尺度视错觉、反射表面、镜面、透明玻璃幕墙、精细纹理及自然/人工环境光变化等挑战性视觉元素。本研究旨在弥合合成训练数据与真实相机光学之间的现实性鸿沟,并揭示当前主流单目深度估计、立体深度估计及景深方法面临的挑战。我们公开数据集、标定文件及评估代码,以支持可复现的真实环境光学泛化研究。