Interpretability and efficiency are two important considerations for the adoption of neural automatic metrics. In this work, we develop strong-performing automatic metrics for reference-based summarization evaluation, based on a two-stage evaluation pipeline that first extracts basic information units from one text sequence and then checks the extracted units in another sequence. The metrics we developed include two-stage metrics that can provide high interpretability at both the fine-grained unit level and summary level, and one-stage metrics that achieve a balance between efficiency and interoperability. We make the developed tools publicly available through a Python package and GitHub.
翻译:可解释性和效率是神经自动评估方法得以应用的两项重要考量。本研究基于一种两阶段评估流程,开发了性能强大的自动评估方法,用于基于参考的摘要评估。该方法先从一个文本序列中提取基本信息单元,再在另一序列中检查所提取的单元。我们开发的评估方法包括两类:两阶段方法能在细粒度单元级别和摘要级别提供高可解释性;单阶段方法则在效率与可解释性之间实现平衡。我们将所开发的工具通过Python包及GitHub公开发布。