Scientific posters are used to present the contributions of scientific papers effectively in a graphical format. However, creating a well-designed poster that efficiently summarizes the core of a paper is both labor-intensive and time-consuming. A system that can automatically generate well-designed posters from scientific papers would reduce the workload of authors and help readers understand the outline of the paper visually. Despite the demand for poster generation systems, only a limited research has been conduced due to the lack of publicly available datasets. Thus, in this study, we built the SciPostLayout dataset, which consists of 7,855 scientific posters and manual layout annotations for layout analysis and generation. SciPostLayout also contains 100 scientific papers paired with the posters. All of the posters and papers in our dataset are under the CC-BY license and are publicly available. As benchmark tests for the collected dataset, we conducted experiments for layout analysis and generation utilizing existing computer vision models and found that both layout analysis and generation of posters using SciPostLayout are more challenging than with scientific papers. We also conducted experiments on generating layouts from scientific papers to demonstrate the potential of utilizing LLM as a scientific poster generation system. The dataset is publicly available at https://huggingface.co/datasets/omron-sinicx/scipostlayout_v2. The code is also publicly available at https://github.com/omron-sinicx/scipostlayout.
翻译:科学海报以图形化形式有效呈现科学论文的贡献。然而,设计一张能高效概括论文核心内容的海报既费力又耗时。能够从科学论文自动生成设计精良海报的系统将减轻作者的工作负担,并帮助读者直观理解论文概要。尽管对海报生成系统存在需求,但由于缺乏公开可用的数据集,相关研究仍十分有限。因此,本研究构建了SciPostLayout数据集,包含7,855张科学海报及用于布局分析与生成的人工布局标注。SciPostLayout还包含100篇与海报配对的科学论文。数据集中所有海报与论文均采用CC-BY许可协议并公开可用。作为所收集数据集的基准测试,我们利用现有计算机视觉模型进行了布局分析与生成实验,发现使用SciPostLayout进行海报布局分析与生成均比科学论文更具挑战性。我们还进行了从科学论文生成布局的实验,以展示利用LLM作为科学海报生成系统的潜力。数据集公开发布于https://huggingface.co/datasets/omron-sinicx/scipostlayout_v2。代码亦公开于https://github.com/omron-sinicx/scipostlayout。