Contents generated by recent advanced Text-to-Image (T2I) diffusion models are sometimes too imaginative for existing off-the-shelf property semantic predictors to estimate due to the immitigable domain gap. We introduce DMP, a pipeline utilizing pre-trained T2I models as a prior for pixel-level semantic prediction tasks. To address the misalignment between deterministic prediction tasks and stochastic T2I models, we reformulate the diffusion process through a sequence of interpolations, establishing a deterministic mapping between input RGB images and output prediction distributions. To preserve generalizability, we use low-rank adaptation to fine-tune pre-trained models. Extensive experiments across five tasks, including 3D property estimation, semantic segmentation, and intrinsic image decomposition, showcase the efficacy of the proposed method. Despite limited-domain training data, the approach yields faithful estimations for arbitrary images, surpassing existing state-of-the-art algorithms.
翻译:近期先进文本到图像(T2I)扩散模型生成的内容有时过于天马行空,导致现有现成属性语义预测器因难以弥合的领域差异而无法准确估计。我们提出DMP,一种利用预训练T2I模型作为先验的像素级语义预测任务管线。为解决确定性预测任务与随机性T2I模型之间的不匹配问题,我们通过插值序列重构扩散过程,建立输入RGB图像与输出预测分布之间的确定性映射。为保持泛化能力,我们采用低秩自适应方法微调预训练模型。在三维属性估计、语义分割、本征图像分解等五项任务上的广泛实验展示了所提方法的有效性。尽管训练数据局限于特定领域,该方法仍能对任意图像生成可靠的估计,并超越现有最先进算法。