Text-to-image generation models have advanced rapidly, yet achieving fine-grained control over generated images remains difficult, largely due to limited understanding of how semantic information is encoded. We develop an interpretation of the color representation in the Variational Autoencoder latent space of FLUX.1 [Dev], revealing a structure reflecting Hue, Saturation, and Lightness. We verify our Latent Color Subspace (LCS) interpretation by demonstrating that it can both predict and explicitly control color, introducing a fully training-free method in FLUX based solely on closed-form latent-space manipulation. Code is available at https://github.com/ExplainableML/LCS.
翻译:文本到图像生成模型发展迅速,但在生成图像中实现精细控制仍面临挑战,这主要归因于对语义信息编码机制的理解有限。我们针对FLUX.1 [Dev]变分自编码器潜空间中的色彩表征进行解析,揭示了反映色相、饱和度和明度的结构性特征。通过证明该潜在色彩子空间(LCS)解释既能预测又能显式控制色彩,我们提出一种完全基于闭式潜空间操作的FLUX无训练方法。代码见项目地址:https://github.com/ExplainableML/LCS。