Multi-task table recognition jointly addresses table structure prediction, cell localization, and cell content recognition within a unified framework. Existing approaches often rely on autoregressive decoders to generate table structures and reuse their hidden states for cell localization and content recognition. This autoregressive generation process can make cell representations order-dependent, degrading global consistency across cells. This paper proposes a structural refinement module that produces order-independent cell features through non-causal attention. This design enables parallel inference of cell contents while conditioning each cell on global context encoded in the refined features. Experiments on two large datasets demonstrate consistent gains in cell localization and end-to-end recognition, while reducing overall inference time by around threefold.
翻译:多任务表格识别在统一框架内联合解决表格结构预测、单元定位和单元内容识别。现有方法通常依赖自回归解码器生成表格结构,并复用其隐藏状态进行单元定位和内容识别。这种自回归生成过程可能导致单元表示具有顺序依赖性,从而削弱跨单元的全局一致性。本文提出一种结构细化模块,通过非因果注意力生成与顺序无关的单元特征。该设计使得在并行推理单元内容的同时,每个单元能基于细化特征中编码的全局上下文进行条件化处理。在两个大型数据集上的实验表明,该方法在单元定位和端到端识别中持续获得性能提升,同时将整体推理时间减少约三倍。