The evaluation of image generators remains a challenge due to the limitations of traditional metrics in providing nuanced insights into specific image regions. This is a critical problem as not all regions of an image may be learned with similar ease. In this work, we propose a novel approach to disentangle the cosine similarity of mean embeddings into the product of cosine similarities for individual pixel clusters via central kernel alignment. Consequently, we can quantify the contribution of the cluster-wise performance to the overall image generation performance. We demonstrate how this enhances the explainability and the likelihood of identifying pixel regions of model misbehavior across various real-world use cases.
翻译:由于传统评估指标在提供特定图像区域细微洞察方面存在局限,图像生成器的评估仍面临挑战。这一问题的关键在于图像的不同区域可能并非以同等难度被学习。本研究提出一种创新方法,通过中心核对齐技术将均值嵌入的余弦相似度解耦为各像素簇余弦相似度的乘积。由此,我们能够量化像素簇级性能对整体图像生成性能的贡献度。我们通过多个实际用例证明,该方法能有效增强模型可解释性,并提高识别像素区域模型异常行为的可能性。