The depth-of-field (DoF) effect, which introduces aesthetically pleasing blur, enhances photographic quality but is fixed and difficult to modify once the image has been created. This becomes problematic when the applied blur is undesirable~(e.g., the subject is out of focus). To address this, we propose DiffCamera, a model that enables flexible refocusing of a created image conditioned on an arbitrary new focus point and a blur level. Specifically, we design a diffusion transformer framework for refocusing learning. However, the training requires pairs of data with different focus planes and bokeh levels in the same scene, which are hard to acquire. To overcome this limitation, we develop a simulation-based pipeline to generate large-scale image pairs with varying focus planes and bokeh levels. With the simulated data, we find that training with only a vanilla diffusion objective often leads to incorrect DoF behaviors due to the complexity of the task. This requires a stronger constraint during training. Inspired by the photographic principle that photos of different focus planes can be linearly blended into a multi-focus image, we propose a stacking constraint during training to enforce precise DoF manipulation. This constraint enhances model training by imposing physically grounded refocusing behavior that the focusing results should be faithfully aligned with the scene structure and the camera conditions so that they can be combined into the correct multi-focus image. We also construct a benchmark to evaluate the effectiveness of our refocusing model. Extensive experiments demonstrate that DiffCamera supports stable refocusing across a wide range of scenes, providing unprecedented control over DoF adjustments for photography and generative AI applications.
翻译:景深(DoF)效果通过引入美观的模糊来提升摄影质量,但该效果在图像生成后即固定且难以修改。当施加的模糊效果不理想时(例如主体失焦),这一问题尤为突出。为解决此问题,我们提出DiffCamera模型,该模型能够根据任意新焦点和模糊程度对已生成图像进行灵活重聚焦。具体而言,我们设计了一个用于重聚焦学习的扩散Transformer框架。然而,训练过程需要同一场景中具有不同焦平面和虚化程度的数据对,此类数据难以获取。为克服这一限制,我们开发了基于仿真的流程来生成具有不同焦平面和虚化程度的大规模图像对。使用仿真数据进行训练时,我们发现仅采用标准扩散目标函数常因任务复杂性导致错误的景深行为,这需要在训练中施加更强约束。受不同焦平面照片可线性混合成多焦点图像的摄影原理启发,我们在训练中提出堆叠约束以确保精确的景深操控。该约束通过施加基于物理原理的重聚焦行为来增强模型训练,要求聚焦结果必须与场景结构和相机条件严格对齐,从而能够组合成正确的多焦点图像。我们还构建了基准测试来评估重聚焦模型的有效性。大量实验表明,DiffCamera能够在广泛场景中实现稳定重聚焦,为摄影和生成式AI应用提供前所未有的景深调节控制能力。