We introduce DiffBMP, a scalable and efficient differentiable rendering engine for a collection of bitmap images. Our work addresses a limitation that traditional differentiable renderers are constrained to vector graphics, given that most images in the world are bitmaps. Our core contribution is a highly parallelized rendering pipeline, featuring a custom CUDA implementation for calculating gradients. This system can, for example, optimize the position, rotation, scale, color, and opacity of thousands of bitmap primitives all in under 1 min using a consumer GPU. We employ and validate several techniques to facilitate the optimization: soft rasterization via Gaussian blur, structure-aware initialization, noisy canvas, and specialized losses/heuristics for videos or spatially constrained images. We demonstrate DiffBMP is not just an isolated tool, but a practical one designed to integrate into creative workflows. It supports exporting compositions to a native, layered file format, and the entire framework is publicly accessible via an easy-to-hack Python package.
翻译:我们提出了DiffBMP,一个面向位图图像集合的可扩展、高效的可微分渲染引擎。我们的工作旨在解决传统可微分渲染器局限于矢量图形的问题,而现实世界中的图像大多为位图。我们的核心贡献是一个高度并行化的渲染管线,其特点是采用自定义CUDA实现来计算梯度。例如,该系统可以在消费级GPU上,在1分钟内同时优化数千个位图图元的位置、旋转、缩放、颜色和不透明度。我们采用并验证了多种技术来促进优化:通过高斯模糊实现软光栅化、结构感知初始化、噪声画布,以及针对视频或空间约束图像的特殊损失函数/启发式方法。我们证明DiffBMP不仅仅是一个独立的工具,更是一个旨在融入创意工作流程的实用系统。它支持将合成结果导出为原生分层文件格式,并且整个框架通过一个易于二次开发的Python包公开提供。