This paper describes the IUST NLP Lab submission to the Prompting Large Language Models as Explainable Metrics Shared Task at the Eval4NLP 2023 Workshop on Evaluation & Comparison of NLP Systems. We have proposed a zero-shot prompt-based strategy for explainable evaluation of the summarization task using Large Language Models (LLMs). The conducted experiments demonstrate the promising potential of LLMs as evaluation metrics in Natural Language Processing (NLP), particularly in the field of summarization. Both few-shot and zero-shot approaches are employed in these experiments. The performance of our best provided prompts achieved a Kendall correlation of 0.477 with human evaluations in the text summarization task on the test data. Code and results are publicly available on GitHub.
翻译:本文描述了IUST NLP实验室在Eval4NLP 2023自然语言处理系统评估与比较研讨会上提交的“将大型语言模型作为可解释评估指标的提示策略”共享任务成果。我们提出了一种基于零样本提示的策略,用于利用大型语言模型(LLMs)对摘要任务进行可解释的评估。实验结果表明,LLMs作为自然语言处理(NLP)评估指标,尤其在摘要领域展现出巨大潜力。本次实验同时采用了少样本和零样本两种方法。在文本摘要任务的测试数据上,我们提供的最优提示性能与人工评估的Kendall相关系数达到0.477。相关代码和结果已在GitHub上公开。