A Pyramidal Histogram Of Characters (PHOC) represents the spatial location of symbols as binary vectors. The vectors are composed of levels that split a formula into equal-sized regions of one or more types (e.g., rectangles or ellipses). For each region type, this produces a pyramid of overlapping regions, where the first level contains the entire formula, and the final level the finest-grained regions. In this work, we introduce concentric rectangles for regions, and analyze whether subsequent PHOC levels encode redundant information by omitting levels from PHOC configurations. As a baseline, we include a bag of words PHOC containing only the first whole-formula level. Finally, using the ARQMath-3 formula retrieval benchmark, we demonstrate that some levels encoded in the original PHOC configurations are redundant, that PHOC models with rectangular regions outperform earlier PHOC models, and that despite their simplicity, PHOC models are surprisingly competitive with the state-of-the-art. PHOC is not math-specific, and might be used for chemical diagrams, charts, or other graphics.
翻译:字符金字塔直方图(PHOC)将符号的空间位置表示为二进制向量。这些向量由多个层级构成,每个层级将公式分割为一种或多种类型(例如矩形或椭圆形)的等尺寸区域。对于每种区域类型,这会产生一个由重叠区域构成的金字塔,其中第一层包含整个公式,而最后一层则包含最精细的区域。在本研究中,我们引入了同心矩形区域,并通过从PHOC配置中省略层级,分析了后续PHOC层级是否编码了冗余信息。作为基线,我们包含了一个仅包含第一层(整个公式)的词袋PHOC。最后,利用ARQMath-3公式检索基准,我们证明了原始PHOC配置中编码的某些层级是冗余的,采用矩形区域的PHOC模型性能优于早期的PHOC模型,并且尽管PHOC模型结构简单,其表现却出人意料地与当前最先进技术相竞争。PHOC并非数学专用,也可用于化学图表、示意图或其他图形。