We address unsupervised discontinuous constituency parsing, where we observe a high variance in the performance of the only previous model in the literature. We propose to build an ensemble of different runs of the existing discontinuous parser by averaging the predicted trees, to stabilize and boost performance. To begin with, we provide comprehensive computational complexity analysis (in terms of P and NP-complete) for tree averaging under different setups of binarity and continuity. We then develop an efficient exact algorithm to tackle the task, which runs in a reasonable time for all samples in our experiments. Results on three datasets show our method outperforms all baselines in all metrics; we also provide in-depth analyses of our approach.
翻译:本文研究无监督非连续成分句法分析任务,观察到现有文献中唯一模型存在显著的性能波动。为提升性能稳定性,我们提出通过对现有非连续句法分析器多次运行结果进行树结构平均来构建集成模型。首先,我们针对不同二元性与连续性设置下的树平均问题,提供了完整的计算复杂度分析(涵盖P类与NP完全问题)。随后,我们开发了一种高效精确算法来处理该任务,该算法在实验中对所有样本均在合理时间内完成运行。在三个数据集上的实验结果表明,本方法在所有评估指标上均优于现有基线;同时我们还对方法进行了深入分析。