This paper presents a graph cascade for sentence segmentation of XML documents. Our proposal offers sentences inside sentences for cases introduced by quotation marks and hyphens, and also pays particular attention to situations involving incises introduced by parentheses and lists introduced by colons. We present how the tool works and compare the results obtained with those available in 2019 on the same dataset, together with an evaluation of the system's performance on a test corpus
翻译:本文提出了一种用于XML文档句子分割的图级联方法。我们的方案针对引号和连字符引入的情况提供了句子内部的句子划分,同时特别关注括号引入的插入语和冒号引入的列表结构。我们阐述了该工具的工作原理,并将所得结果与2019年同一数据集上的可用结果进行了比较,同时评估了系统在测试语料上的性能表现。