Diffusion models have recently become the dominant paradigm for image generation, yet existing systems struggle to interpret and follow numeric instructions for adjusting semantic attributes. In real-world creative scenarios, especially when precise control over aesthetic attributes is required, current methods fail to provide such controllability. This limitation partly arises from the subjective and context-dependent nature of aesthetic judgments, but more fundamentally stems from the fact that current text encoders are designed for discrete tokens rather than continuous values. Meanwhile, efforts on aesthetic alignment, often leveraging reinforcement learning, direct preference optimization, or architectural modifications, primarily align models with a global notion of human preference. While these approaches improve user experience, they overlook the multifaceted and compositional nature of aesthetics, underscoring the need for explicit disentanglement and independent control of aesthetic attributes. To address this gap, we introduce AttriCtrl, a lightweight framework for continuous aesthetic intensity control in diffusion models. It first defines relevant aesthetic attributes, then quantifies them through a hybrid strategy that maps both concrete and abstract dimensions onto a unified $[0,1]$ scale. A plug-and-play value encoder is then used to transform user-specified values into model-interpretable embeddings for controllable generation. Experiments show that AttriCtrl achieves accurate and continuous control over both single and multiple aesthetic attributes, significantly enhancing personalization and diversity. Crucially, it is implemented as a lightweight adapter while keeping the diffusion model frozen, ensuring seamless integration with existing frameworks such as ControlNet at negligible computational cost.
翻译:扩散模型近期已成为图像生成的主导范式,然而现有系统在解释和遵循调整语义属性的数值指令方面仍存在困难。在现实世界的创意场景中,尤其是当需要对美学属性进行精确控制时,现有方法无法提供这种可控性。这一局限部分源于美学判断的主观性和情境依赖性,但更根本地源于当前文本编码器是为离散标记而非连续值设计的。与此同时,基于强化学习、直接偏好优化或架构修改的美学对齐工作,主要将模型与人类偏好的全局概念对齐。尽管这些方法改善了用户体验,但它们忽视了美学多面性与组合性的本质,凸显了对美学属性进行显式解耦与独立控制的必要性。为填补这一空白,我们提出了AttriCtrl,一个用于扩散模型中连续美学强度控制的轻量级框架。它首先定义相关美学属性,随后通过一种混合策略对其进行量化,该策略将具体与抽象维度映射到统一的$[0,1]$标度上。接着,一个即插即用的值编码器将用户指定的数值转换为模型可解释的嵌入,以实现可控生成。实验表明,AttriCtrl能够对单一及多个美学属性实现准确且连续的控制,显著提升了个性化与多样性。关键的是,该框架以轻量级适配器的形式实现,同时保持扩散模型冻结,确保能以可忽略的计算成本与ControlNet等现有框架无缝集成。