Local railway committees need timely situational awareness after highway-rail grade crossing incidents, yet official Federal Railroad Administration (FRA) investigations can take days to weeks. We present a demo system that populates Highway-Rail Grade Crossing Incident Data (Form 57) from news in real time. Our approach addresses two core challenges: the form is visually irregular and semantically dense, and news is noisy. To solve these problems, we design a pipeline that first converts Form 57 into a JSON schema using a vision language model with sample aggregation, and then performs grouped question answering following the intent of the form layout to reduce ambiguity. In addition, we build an evaluation dataset by aligning scraped news articles with official FRA records and annotating retrievable information. We then assess our system against various alternatives in terms of information retrieval accuracy and coverage.
翻译:地方铁路委员会需要在公路-铁路平交道口事故后及时掌握态势感知,但联邦铁路管理局(FRA)的官方调查可能需要数天至数周时间。我们展示了一个演示系统,能够从新闻中实时填充公路-铁路平交道口事故数据(57号表格)。我们的方法解决了两个核心挑战:表格在视觉上不规则且语义密集,以及新闻信息存在噪声。为解决这些问题,我们设计了一个处理流程:首先通过结合样本聚合的视觉语言模型将57号表格转换为JSON模式,随后依据表格布局的意图进行分组式问答以减少歧义。此外,我们通过将抓取的新闻文章与官方FRA记录进行对齐并标注可检索信息,构建了一个评估数据集。随后,我们在信息检索准确率和覆盖率方面,将我们的系统与多种替代方案进行了对比评估。