Visual chart recognition systems are gaining increasing attention due to the growing demand for automatically identifying table headers and values from chart images. Current methods rely on keypoint detection to estimate data element shapes in charts but suffer from grouping errors in post-processing. To address this issue, we propose ChartDETR, a transformer-based multi-shape detector that localizes keypoints at the corners of regular shapes to reconstruct multiple data elements in a single chart image. Our method predicts all data element shapes at once by introducing query groups in set prediction, eliminating the need for further postprocessing. This property allows ChartDETR to serve as a unified framework capable of representing various chart types without altering the network architecture, effectively detecting data elements of diverse shapes. We evaluated ChartDETR on three datasets, achieving competitive results across all chart types without any additional enhancements. For example, ChartDETR achieved an F1 score of 0.98 on Adobe Synthetic, significantly outperforming the previous best model with a 0.71 F1 score. Additionally, we obtained a new state-of-the-art result of 0.97 on ExcelChart400k. The code will be made publicly available.
翻译:视觉图表识别系统因自动从图表图像中识别表格标题和数值的需求日益增长而备受关注。现有方法依赖关键点检测来估计图表中的数据元素形状,但在后处理中容易出现分组错误。为解决此问题,我们提出ChartDETR——一种基于Transformer的多形状检测器,通过定位规则形状角点的关键点,在单一图表图像中重建多个数据元素。该方法通过引入集合预测中的查询组,一次性预测所有数据元素形状,无需额外后处理。这一特性使ChartDETR能够作为统一框架,在不改变网络架构的情况下表示多种图表类型,有效检测不同形状的数据元素。我们在三个数据集上评估ChartDETR,在所有图表类型上均取得具有竞争力的结果,无需任何额外增强。例如,ChartDETR在Adobe Synthetic数据集上获得0.98的F1分数,显著优于此前最佳模型(0.71 F1分数)。此外,我们在ExcelChart400k数据集上取得了0.97的最新最优结果。相关代码将公开发布。