DiffCamera：图像任意重聚焦 (DiffCamera: Arbitrary Refocusing on Images)

The depth-of-field (DoF) effect, which introduces aesthetically pleasing blur, enhances photographic quality but is fixed and difficult to modify once the image has been created. This becomes problematic when the applied blur is undesirable~(e.g., the subject is out of focus). To address this, we propose DiffCamera, a model that enables flexible refocusing of a created image conditioned on an arbitrary new focus point and a blur level. Specifically, we design a diffusion transformer framework for refocusing learning. However, the training requires pairs of data with different focus planes and bokeh levels in the same scene, which are hard to acquire. To overcome this limitation, we develop a simulation-based pipeline to generate large-scale image pairs with varying focus planes and bokeh levels. With the simulated data, we find that training with only a vanilla diffusion objective often leads to incorrect DoF behaviors due to the complexity of the task. This requires a stronger constraint during training. Inspired by the photographic principle that photos of different focus planes can be linearly blended into a multi-focus image, we propose a stacking constraint during training to enforce precise DoF manipulation. This constraint enhances model training by imposing physically grounded refocusing behavior that the focusing results should be faithfully aligned with the scene structure and the camera conditions so that they can be combined into the correct multi-focus image. We also construct a benchmark to evaluate the effectiveness of our refocusing model. Extensive experiments demonstrate that DiffCamera supports stable refocusing across a wide range of scenes, providing unprecedented control over DoF adjustments for photography and generative AI applications.

翻译：景深（DoF）效果通过引入美观的模糊来提升摄影质量，但该效果在图像生成后即固定且难以修改。当施加的模糊效果不理想时（例如主体失焦），这一问题尤为突出。为解决此问题，我们提出DiffCamera模型，该模型能够根据任意新焦点和模糊程度对已生成图像进行灵活重聚焦。具体而言，我们设计了一个用于重聚焦学习的扩散Transformer框架。然而，训练过程需要同一场景中具有不同焦平面和虚化程度的数据对，此类数据难以获取。为克服这一限制，我们开发了基于仿真的流程来生成具有不同焦平面和虚化程度的大规模图像对。使用仿真数据进行训练时，我们发现仅采用标准扩散目标函数常因任务复杂性导致错误的景深行为，这需要在训练中施加更强约束。受不同焦平面照片可线性混合成多焦点图像的摄影原理启发，我们在训练中提出堆叠约束以确保精确的景深操控。该约束通过施加基于物理原理的重聚焦行为来增强模型训练，要求聚焦结果必须与场景结构和相机条件严格对齐，从而能够组合成正确的多焦点图像。我们还构建了基准测试来评估重聚焦模型的有效性。大量实验表明，DiffCamera能够在广泛场景中实现稳定重聚焦，为摄影和生成式AI应用提供前所未有的景深调节控制能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日