We propose "Conceptual Coverage Across Languages" (CoCo-CroLa), a technique for benchmarking the degree to which any generative text-to-image system provides multilingual parity to its training language in terms of tangible nouns. For each model we can assess "conceptual coverage" of a given target language relative to a source language by comparing the population of images generated for a series of tangible nouns in the source language to the population of images generated for each noun under translation in the target language. This technique allows us to estimate how well-suited a model is to a target language as well as identify model-specific weaknesses, spurious correlations, and biases without a-priori assumptions. We demonstrate how it can be used to benchmark T2I models in terms of multilinguality, and how despite its simplicity it is a good proxy for impressive generalization.
翻译:我们提出“跨语言概念覆盖”(CoCo-CroLa)技术,用于衡量任意生成式文本到图像系统在具体名词层面上,为其训练语言提供多语言对等性的程度。对于每个模型,我们可通过比较源语言中一系列具体名词所生成的图像群体,与目标语言中每个名词翻译后所生成的图像群体,来评估给定目标语言相对于源语言的“概念覆盖”。该技术无需先验假设,即可估计模型对目标语言的适配程度,并识别模型特有的弱点、虚假关联和偏差。我们展示了如何利用该技术从多语言性角度对T2I模型进行基准测试,并证明尽管方法简单,它仍能作为评估泛化能力的良好代理指标。