With the rapid development of text-to-image generation technology, accurately assessing the alignment between generated images and text prompts has become a critical challenge. Existing methods rely on Euclidean space metrics, neglecting the structured nature of semantic alignment, while lacking adaptive capabilities for different samples. To address these limitations, we propose HyperAlign, an adaptive text-to-image alignment assessment framework based on hyperbolic entailment geometry. First, we extract Euclidean features using CLIP and map them to hyperbolic space. Second, we design a dynamic-supervision entailment modeling mechanism that transforms discrete entailment logic into continuous geometric structure supervision. Finally, we propose an adaptive modulation regressor that utilizes hyperbolic geometric features to generate sample-level modulation parameters, adaptively calibrating Euclidean cosine similarity to predict the final score. HyperAlign achieves highly competitive performance on both single database evaluation and cross-database generalization tasks, fully validating the effectiveness of hyperbolic geometric modeling for image-text alignment assessment.
翻译:随着文生图技术的快速发展,准确评估生成图像与文本提示之间的对齐性已成为关键挑战。现有方法依赖欧几里得空间度量,忽视了语义对齐的结构化特性,且缺乏针对不同样本的自适应能力。为克服这些局限,本文提出HyperAlign——一种基于双曲蕴含几何的自适应文图对齐评估框架。首先,我们利用CLIP提取欧几里得特征并将其映射至双曲空间;其次,设计动态监督的蕴含建模机制,将离散的蕴含逻辑转化为连续几何结构监督;最后,提出自适应调制回归器,利用双曲几何特征生成样本级调制参数,自适应校准欧几里得余弦相似度以预测最终得分。HyperAlign在单数据库评估与跨数据库泛化任务中均取得极具竞争力的性能,充分验证了双曲几何建模对图文对齐评估的有效性。