This paper examines the limitations of advanced text-to-image models in accurately rendering unconventional concepts which are scarcely represented or absent in their training datasets. We identify how these limitations not only confine the creative potential of these models but also pose risks of reinforcing stereotypes. To address these challenges, we introduce the Inpaint Biases framework, which employs user-defined masks and inpainting techniques to enhance the accuracy of image generation, particularly for novel or inaccurately rendered objects. Through experimental validation, we demonstrate how this framework significantly improves the fidelity of generated images to the user's intent, thereby expanding the models' creative capabilities and mitigating the risk of perpetuating biases. Our study contributes to the advancement of text-to-image models as unbiased, versatile tools for creative expression.
翻译:本文探讨了先进文本到图像模型在准确呈现训练数据集中罕见或缺失的非传统概念方面的局限性。我们揭示了这些局限性不仅限制了模型的创意潜力,还可能导致强化刻板印象的风险。为解决这些挑战,我们提出了Inpaint Biases框架,该框架利用用户定义的遮罩和图像修复技术来提升图像生成的准确性,特别是针对新颖或渲染不准确的物体。通过实验验证,我们证明了该框架能显著提高生成图像与用户意图的贴合度,从而扩展模型的创意能力并降低延续偏见的风险。本研究为推动文本到图像模型成为无偏见、多功能的创意表达工具做出了贡献。