In an era where visual content generation is increasingly driven by machine learning, the integration of human feedback into generative models presents significant opportunities for enhancing user experience and output quality. This study explores strategies for incorporating iterative human feedback into the generative process of diffusion-based text-to-image models. We propose FABRIC, a training-free approach applicable to a wide range of popular diffusion models, which exploits the self-attention layer present in the most widely used architectures to condition the diffusion process on a set of feedback images. To ensure a rigorous assessment of our approach, we introduce a comprehensive evaluation methodology, offering a robust mechanism to quantify the performance of generative visual models that integrate human feedback. We show that generation results improve over multiple rounds of iterative feedback through exhaustive analysis, implicitly optimizing arbitrary user preferences. The potential applications of these findings extend to fields such as personalized content creation and customization.
翻译:在视觉内容生成日益依赖机器学习的时代,将人类反馈整合到生成模型为提升用户体验与输出质量创造了重要契机。本研究探索了将迭代人类反馈融入基于扩散的文本到图像模型生成过程的策略。我们提出FABRIC——一种无需训练的通用方法,可适用于各类主流扩散模型。该方法利用最广泛使用的架构中所具备的自注意力层,将扩散过程条件化为基于一组反馈图像的条件生成。为确保方法的严谨评估,我们引入了一套综合评价体系,为量化融合人类反馈的视觉生成模型的性能提供了稳健机制。通过穷举分析,我们证明生成结果在多次迭代反馈后持续改进,同时隐式优化了任意用户偏好。该发现可拓展至个性化内容创建与定制等领域。