Text-to-image generation models have advanced rapidly, yet achieving fine-grained control over generated images remains difficult, largely due to limited understanding of how semantic information is encoded. We develop an interpretation of the color representation in the Variational Autoencoder latent space of FLUX.1 [Dev], revealing a structure reflecting Hue, Saturation, and Lightness. We verify our Latent Color Subspace (LCS) interpretation by demonstrating that it can both predict and explicitly control color, introducing a fully training-free method in FLUX based solely on closed-form latent-space manipulation. Code is available at https://github.com/ExplainableML/LCS.
翻译:文本到图像生成模型发展迅速,但实现对生成图像的细粒度控制仍然困难,这主要源于对语义信息编码方式的理解有限。我们对FLUX.1 [Dev]变分自编码器潜空间中的色彩表征提出了一种解释,揭示了一种反映色调、饱和度和明度的结构。我们通过证明该潜藏色彩子空间(LCS)解释既能预测又能显式控制色彩,验证了其有效性,并引入了一种完全无需训练、仅基于闭式潜空间操作的FLUX控制方法。代码发布于 https://github.com/ExplainableML/LCS。