The evaluation of Natural Language Generation (NLG) models has gained increased attention, urging the development of metrics that evaluate various aspects of generated text. LUNA addresses this challenge by introducing a unified interface for 20 NLG evaluation metrics. These metrics are categorized based on their reference-dependence and the type of text representation they employ, from string-based n-gram overlap to the utilization of static embeddings and pre-trained language models. The straightforward design of LUNA allows for easy extension with novel metrics, requiring just a few lines of code. LUNA offers a user-friendly tool for evaluating generated texts.
翻译:自然语言生成(NLG)模型的评估日益受到关注,这推动了针对生成文本各方面进行度量的评估指标研发。LUNA通过引入一个统一的接口,整合了20种NLG评估指标,从而应对这一挑战。这些指标基于其对参考文本的依赖程度以及所采用的文本表征类型进行分类——从基于字符串的n-gram重叠到静态词嵌入和预训练语言模型的应用。LUNA的简洁设计使其能够轻松扩展至新型评估指标,仅需数行代码即可实现。该框架为评估生成文本提供了用户友好的工具。