Text-to-chart retrieval, enabling users to find relevant charts via natural language queries, has gained significant attention. However, evaluating models in real-world business intelligence (BI) scenarios is challenging, as current benchmarks fail to simulate realistic user queries or test for deep semantic understanding with static chart images.To address this gap, we introduce CRBench, the first real-world BI-sourced benchmark comprising 21,862 charts and 326 queries, utilizing a Target-and-Distractor paradigm to evaluate discriminative retrieval among highly similar candidates. Testing on CRBench reveals that existing methods, which rely primarily on visual features, perform poorly and fail to capture the rich analytical semantics of charts. To address this performance bottleneck, we propose a semantic insights synthesis pipeline that automatically generates three hierarchical levels of insights for charts: visual patterns, statistical properties, and practical applications. Using this pipeline, we produced 207,498 semantic insights for 69,166 charts as training data. By leveraging this data to bridge the gap between natural language query intent and latent visual representations via multi-level semantic supervision, we develop ChartFinder, a specialized model capable of deep cross-model reasoning. Experimental results show ChartFinder significantly outperforms state-of-the-art methods on CRBench, achieving up to 66.9% NDCG@10 for precise queries (an 11.58% improvement) and an average increase of 5% across nearly all metrics for fuzzy queries. This work provides the community with a much-needed benchmark for realistic evaluation and demonstrates a powerful data synthesis paradigm for enhancing a model's semantic understanding of charts.
翻译:文本-图表检索允许用户通过自然语言查询查找相关图表,已引起广泛关注。然而,在现实商业智能(BI)场景中评估模型具有挑战性,因为现有基准测试既无法模拟真实的用户查询,也无法通过静态图表图像测试深度语义理解能力。为弥补这一不足,我们提出了CRBench——首个基于真实商业智能数据的基准测试集,包含21,862张图表和326条查询,采用目标-干扰项范式评估高度相似候选图表间的判别性检索能力。在CRBench上的测试表明,现有主要依赖视觉特征的方法表现不佳,且无法捕捉图表丰富的分析语义。针对这一性能瓶颈,我们提出了语义洞察合成流水线,可自动生成三个层次化的图表洞察:视觉模式、统计特性和实际应用。利用该流水线,我们为69,166张图表生成了207,498条语义洞察作为训练数据。通过多级语义监督,利用这些数据弥合自然语言查询意图与潜在视觉表征之间的鸿沟,我们开发了具备深度跨模态推理能力的专用模型ChartFinder。实验结果表明,ChartFinder在CRBench上显著优于现有最优方法:在精确查询中NDCG@10达到66.9%(提升11.58%),在模糊查询中几乎所有指标平均提升5%。本研究为学界提供了亟需的真实评估基准,并展示了通过数据合成范式增强模型图表语义理解能力的有效路径。