We address personalized image enhancement in this study, where we enhance input images for each user based on the user's preferred images. Previous methods apply the same preferred style to all input images (i.e., only one style for each user); in contrast to these methods, we aim to achieve content-aware personalization by applying different styles to each image considering the contents. For content-aware personalization, we make two contributions. First, we propose a method named masked style modeling, which can predict a style for an input image considering the contents by using the framework of masked language modeling. Second, to allow this model to consider the contents of images, we propose a novel training scheme where we download images from Flickr and create pseudo input and retouched image pairs using a degrading model. We conduct quantitative evaluations and a user study, and our method trained using our training scheme successfully achieves content-aware personalization; moreover, our method outperforms other previous methods in this field. Our source code is available at https://github.com/satoshi-kosugi/masked-style-modeling.
翻译:本研究聚焦于个性化图像增强任务,即根据用户偏好的图像对输入图像进行增强。现有方法对所有输入图像应用统一的偏好风格(即每位用户仅对应一种风格),与此不同,我们旨在实现内容感知的个性化——针对每张输入图像的内容应用差异化风格。为实现内容感知个性化,我们做出两项贡献:首先,提出掩码风格建模方法,该方法借鉴掩码语言建模框架,能够根据图像内容预测风格;其次,为使模型能感知图像内容,我们提出新型训练方案——从Flickr下载图像并利用退化模型构建伪输入-重触图像对。通过定量评估与用户研究,采用该训练方案训练的模型成功实现了内容感知的个性化增强,且性能优于该领域现有方法。源代码已开源至https://github.com/satoshi-kosugi/masked-style-modeling。