OsmLocator: locating overlapping scatter marks with a non-training generative perspective

Automated mark localization in scatter images, greatly helpful for discovering knowledge and understanding enormous document images and reasoning in visual question answering AI systems, is a highly challenging problem because of the ubiquity of overlapping marks. Locating overlapping marks faces many difficulties such as no texture, less contextual information, hallow shape and tiny size. Here, we formulate it as a combinatorial optimization problem on clustering-based re-visualization from a non-training generative perspective, to locate scatter marks by finding the status of multi-variables when an objective function reaches a minimum. The objective function is constructed on difference between binarized scatter images and corresponding generated re-visualization based on their clustering. Fundamentally, re-visualization tries to generate a new scatter graph only taking a rasterized scatter image as an input, and clustering is employed to provide the information for such re-visualization. This method could stably locate severely-overlapping, variable-size and variable-shape marks in scatter images without dependence of any training dataset or reference. Meanwhile, we propose an adaptive variant of simulated annealing which can works on various connected regions. In addition, we especially built a dataset named SML2023 containing hundreds of scatter images with different markers and various levels of overlapping severity, and tested the proposed method and compared it to existing methods. The results show that it can accurately locate most marks in scatter images with different overlapping severity and marker types, with about 0.3 absolute increase on an assignment-cost-based metric in comparison with state-of-the-art methods. This work is of value to data mining on massive web pages and literatures, and shedding new light on image measurement such as bubble counting.

翻译：摘要：散点图像中的自动标记定位（对于知识发现、理解海量文档图像以及视觉问答人工智能系统中的推理有重要帮助）因重叠标记的普遍存在而极具挑战性。重叠标记定位面临无纹理、上下文信息缺乏、空心形状及微小尺寸等诸多困难。本文从非训练生成视角将其形式化为基于聚类重可视化的组合优化问题——通过寻找目标函数达到最小值时的多变量状态来定位散点标记。目标函数基于二值化散点图像与其聚类生成的重可视化图像之间的差异构建。本质上，重可视化方法仅以栅格化散点图像为输入生成新散点图，而聚类则为该重可视化过程提供信息支持。该方法无需依赖任何训练数据集或参考样本，即可稳定定位散点图像中严重重叠、尺寸可变且形状多变的标记。同时，我们提出一种能适用于不同连通区域的自适应模拟退火变体算法。此外，特别构建了包含数百幅不同标记类型与重叠程度的散点图像数据集SML2023，并测试了所提方法及现有方法的性能。结果表明，该方法在不同重叠程度和标记类型的散点图像中能准确定位绝大多数标记，在基于分配代价的指标上相比现有最优方法获得约0.3的绝对提升。本研究对海量网页及文献的数据挖掘具有重要价值，并为气泡计数等图像测量领域提供了新思路。