Explanatory images play a pivotal role in accessible and easy-to-read (E2R) texts. However, the images available in online databases are not tailored toward the respective texts, and the creation of customized images is expensive. In this large-scale study, we investigated whether text-to-image generation models can close this gap by providing customizable images quickly and easily. We benchmarked seven, four open- and three closed-source, image generation models and provide an extensive evaluation of the resulting images. In addition, we performed a user study with people from the E2R target group to examine whether the images met their requirements. We find that some of the models show remarkable performance, but none of the models are ready to be used at a larger scale without human supervision. Our research is an important step toward facilitating the creation of accessible information for E2R creators and tailoring accessible images to the target group's needs.
翻译:解释性图像在无障碍与易读文本中发挥着关键作用。然而,在线数据库中的图像通常无法与特定文本内容相匹配,而定制化图像的创作成本高昂。在这项大规模研究中,我们探究了文本到图像生成模型能否通过快速便捷地提供可定制图像来弥合这一差距。我们对七种图像生成模型(四种开源模型与三种闭源模型)进行了基准测试,并对生成图像进行了全面评估。此外,我们与易读文本目标群体开展了用户研究,以检验生成图像是否符合其需求。研究发现部分模型表现出卓越性能,但所有模型均需在人工监督下才能进行大规模应用。本研究为推动易读文本创作者构建无障碍信息,以及根据目标群体需求定制无障碍图像迈出了重要一步。