Emotional Support Conversation (ESC) is a typical dialogue that can effectively assist the user in mitigating emotional pressures. However, owing to the inherent subjectivity involved in analyzing emotions, current non-artificial methodologies face challenges in effectively appraising the emotional support capability. These metrics exhibit a low correlation with human judgments. Concurrently, manual evaluation methods extremely will cause high costs. To solve these problems, we propose a novel model FEEL (Framework for Evaluating Emotional Support Capability with Large Lan-guage Models), employing Large Language Models (LLMs) as evaluators to assess emotional support capabilities. The model meticulously considers various evaluative aspects of ESC to apply a more comprehensive and accurate evaluation method for ESC. Additionally, it employs a probability distribution approach for a more stable result and integrates an ensemble learning strategy, leveraging multiple LLMs with assigned weights to enhance evaluation accuracy. To appraise the performance of FEEL, we conduct extensive experiments on existing ESC model dialogues. Experimental results demonstrate our model exhibits a substantial enhancement in alignment with human evaluations compared to the baselines. Our source code is available at https://github.com/Ansisy/FEEL.
翻译:情感支持对话(ESC)是一种典型对话,能有效帮助用户缓解情绪压力。然而,由于情绪分析固有的主观性,当前非人工方法在有效评估情感支持能力方面面临挑战——这些指标与人类判断的相关性较低。同时,人工评估方法将导致极高成本。为解决这些问题,我们提出一种新型模型FEEL(基于大语言模型的情感支持能力评估框架),利用大语言模型(LLM)作为评估器来评估情感支持能力。该模型细致考量ESC的多个评估维度,对ESC应用更全面准确的评估方法。此外,它采用概率分布方法以获得更稳定的结果,并集成集成学习策略,利用多个加权LLM提升评估准确性。为评估FEEL的性能,我们在现有ESC模型对话上开展广泛实验。实验结果表明,与基线方法相比,我们的模型在与人类评估的一致性上展现出显著提升。我们的源代码可在https://github.com/Ansisy/FEEL获取。