Scientific posters play a vital role in academic communication by presenting ideas through visual summaries. Analyzing reading order and parent-child relations of posters is essential for building structure-aware interfaces that facilitate clear and accurate understanding of research content. Despite their prevalence in academic communication, posters remain underexplored in structural analysis research, which has primarily focused on papers. To address this gap, we constructed SciPostLayoutTree, a dataset of approximately 8,000 posters annotated with reading order and parent-child relations. Compared to an existing structural analysis dataset, SciPostLayoutTree contains more instances of spatially challenging relations, including upward, horizontal, and long-distance relations. As a solution to these challenges, we develop Layout Tree Decoder, which incorporates visual features as well as bounding box features including position and category information. The model also uses beam search to predict relations while capturing sequence-level plausibility. Experimental results demonstrate that our model improves the prediction accuracy for spatially challenging relations and establishes a solid baseline for poster structure analysis. The dataset is publicly available at https://huggingface.co/datasets/omron-sinicx/scipostlayouttree. The code is also publicly available at https://github.com/omron-sinicx/scipostlayouttree.
翻译:科学海报通过视觉摘要呈现研究思想,在学术交流中发挥着至关重要的作用。分析海报的阅读顺序与父子关系对于构建结构感知界面至关重要,这类界面有助于清晰准确地理解研究内容。尽管海报在学术交流中极为普遍,但在结构分析研究中却仍未得到充分探索,该领域主要集中于论文分析。为填补这一空白,我们构建了SciPostLayoutTree数据集,其中包含约8,000张标注了阅读顺序与父子关系的海报。与现有结构分析数据集相比,SciPostLayoutTree包含了更多空间关系具有挑战性的实例,包括向上、水平及长距离关系。针对这些挑战,我们开发了Layout Tree Decoder,该模型不仅融合了视觉特征,还整合了包含位置与类别信息的边界框特征。该模型同时采用束搜索算法在预测关系时捕获序列层面的合理性。实验结果表明,我们的模型提升了空间挑战性关系的预测准确率,并为海报结构分析建立了坚实的基线。数据集公开发布于https://huggingface.co/datasets/omron-sinicx/scipostlayouttree。代码亦公开于https://github.com/omron-sinicx/scipostlayouttree。