Color selection plays a critical role in graphic document design and requires sufficient consideration of various contexts. However, recommending appropriate colors which harmonize with the other colors and textual contexts in documents is a challenging task, even for experienced designers. In this study, we propose a multimodal masked color model that integrates both color and textual contexts to provide text-aware color recommendation for graphic documents. Our proposed model comprises self-attention networks to capture the relationships between colors in multiple palettes, and cross-attention networks that incorporate both color and CLIP-based text representations. Our proposed method primarily focuses on color palette completion, which recommends colors based on the given colors and text. Additionally, it is applicable for another color recommendation task, full palette generation, which generates a complete color palette corresponding to the given text. Experimental results demonstrate that our proposed approach surpasses previous color palette completion methods on accuracy, color distribution, and user experience, as well as full palette generation methods concerning color diversity and similarity to the ground truth palettes.
翻译:颜色选择在图形文档设计中起着关键作用,需要充分考虑各种上下文。然而,推荐与文档中其他颜色及文本语境相协调的合适颜色是一项具有挑战性的任务,即使对于经验丰富的设计师也是如此。在本研究中,我们提出了一种多模态掩码颜色模型,该模型融合了颜色和文本上下文,为图形文档提供文本感知的颜色推荐。我们的模型由自注意力网络(用于捕捉多个调色板中颜色之间的关系)和交叉注意力网络(整合了颜色及基于CLIP的文本表征)组成。该方法主要聚焦于调色板补全任务,即根据给定的颜色和文本推荐颜色。此外,该方法还可应用于另一项颜色推荐任务——完整调色板生成,即根据给定文本生成完整的调色板。实验结果表明,我们提出的方法在准确性、颜色分布和用户体验方面超越了先前的调色板补全方法,并在颜色多样性与真实调色板的相似性上优于完整调色板生成方法。