The ability of generative AI (GenAI) methods to photorealistically alter camera images has raised awareness about the authenticity of images shared online. Interestingly, images captured directly by our cameras are considered authentic and faithful. However, with the increasing integration of deep-learning modules into cameras' capture-time hardware -- namely, the image signal processor (ISP) -- there is now a potential for hallucinated content in images directly output by our cameras. Hallucinated capture-time image content is typically benign, such as enhanced edges or texture, but in certain operations, such as AI-based digital zoom or low-light image enhancement, hallucinations can potentially alter the semantics and interpretation of the image content. As a result, users may not realize that the content in their camera images is not authentic. This paper addresses this issue by enabling users to recover the 'unhallucinated' version of the camera image to avoid misinterpretation of the image content. Our approach works by optimizing an image-specific multi-layer perceptron (MLP) decoder together with a modality-specific encoder so that, given the camera image, we can recover the image before hallucinated content was added. The encoder and MLP are self-contained and can be applied post-capture to the image without requiring access to the camera ISP. Moreover, the encoder and MLP decoder require only 180 KB of storage and can be readily saved as metadata within standard image formats such as JPEG and HEIC.
翻译:生成式AI(GenAI)方法能够以照片级真实感的方式篡改相机图像,这引发了人们对在线共享图像真实性的关注。有趣的是,由相机直接拍摄的图像通常被认为是真实可靠的。然而,随着深度学习模块日益集成到相机的拍摄时硬件(即图像信号处理器ISP)中,相机直接输出的图像现在可能出现幻觉内容。拍摄时产生的幻觉图像内容通常是良性的,例如增强的边缘或纹理,但在某些操作中(如基于AI的数字变焦或低光图像增强),幻觉可能潜在地改变图像内容的语义和解读。因此,用户可能无法意识到相机图像中的内容并非真实。本文通过使用户能够恢复相机图像的“非幻觉”版本,以避免对图像内容的误解,从而解决了这一问题。我们的方法通过联合优化一个图像特定的多层感知器(MLP)解码器与一个模态特定的编码器,使得在给定相机图像的情况下,能够恢复添加幻觉内容之前的原始图像。该编码器和MLP是自包含的,可在拍摄后直接应用于图像,无需访问相机ISP。此外,编码器和MLP解码器仅需180 KB存储空间,并能轻松作为元数据保存于JPEG和HEIC等标准图像格式中。