While Large Language Models (LLMs) have demonstrated remarkable performance in certain dimensions, their ability to express implicit language cues that human use for effective communication remains unclear. This paper presents ExpressivityArena, a Python library for measuring the implicit communication abilities of LLMs. We provide a comprehensive framework to evaluate expressivity of arbitrary LLMs and explore its practical implications. To this end, we refine the definition and measurements of ``expressivity,'' and use our framework in a set of small experiments. These experiments test LLMs in creative and logical tasks such as poetry, coding, and emotion-based responses. They are then evaluated by an automated grader, through ExpressivityArena, which we verify to be the most pragmatic for testing expressivity. Building on these experiments, we deepen our understanding of the expressivity of LLMs by assessing their ability to remain expressive in conversations. Our findings indicate that LLMs are capable of generating and understanding expressive content, however, with some limitations. These insights will inform the future development and deployment of expressive LLMs. We provide the code for ExpressivityArena alongside our paper.
翻译:尽管大型语言模型(LLMs)在某些维度上展现出卓越的性能,但其对人类用于有效沟通的隐性语言线索的表达能力仍不明确。本文提出ExpressivityArena——一个用于衡量LLMs隐性沟通能力的Python库。我们提供了一个综合框架来评估任意LLMs的表达力,并探讨其实际应用价值。为此,我们细化了“表达力”的定义与测量方法,并通过一系列小型实验验证了该框架的有效性。这些实验在诗歌创作、代码生成和情感回应等创造性及逻辑性任务中测试LLMs的表现,随后通过ExpressivityArena的自动评分器进行评估——我们已验证该评分器是测试表达力最实用的工具。基于这些实验,我们通过评估LLMs在对话中保持表达力的能力,深化了对LLMs表达力的理解。研究结果表明,LLMs能够生成并理解具有表达力的内容,但仍存在一定局限性。这些发现将为未来具有表达力的LLMs的开发与部署提供参考。我们随论文同步公开ExpressivityArena的代码。