Paraphrases represent a human's intuitive ability to understand expressions presented in various different ways. Current paraphrase evaluations of language models primarily use binary approaches, offering limited interpretability of specific text changes. Atomic paraphrase types (APT) decompose paraphrases into different linguistic changes and offer a granular view of the flexibility in linguistic expression (e.g., a shift in syntax or vocabulary used). In this study, we assess the human preferences towards ChatGPT in generating English paraphrases with ten APTs and five prompting techniques. We introduce APTY (Atomic Paraphrase TYpes), a dataset of 500 sentence-level and word-level annotations by 15 annotators. The dataset also provides a human preference ranking of paraphrases with different types that can be used to fine-tune models with RLHF and DPO methods. Our results reveal that ChatGPT can generate simple APTs, such as additions and deletions, but struggle with complex structures (e.g., subordination changes). This study contributes to understanding which aspects of paraphrasing language models have already succeeded at understanding and what remains elusive. In addition, our curated datasets can be used to develop language models with specific linguistic capabilities.
翻译:复述体现了人类理解以不同方式呈现表达的直觉能力。当前对语言模型的复述评估主要采用二元方法,对具体文本变化的可解释性有限。原子复述类型(APT)将复述分解为不同的语言变化,提供了语言表达灵活性的细粒度视角(例如,句法或词汇使用的转变)。在本研究中,我们评估了人类对ChatGPT使用十种APT和五种提示技术生成英语复述的偏好。我们引入了APTY(原子复述类型)数据集,包含15位标注者对500个句子级和词汇级标注。该数据集还提供了对不同类型复述的人类偏好排序,可用于通过RLHF和DPO方法微调模型。我们的结果表明,ChatGPT能够生成简单的APT,如添加和删除,但在复杂结构(例如从属关系变化)上存在困难。本研究有助于理解语言模型在复述的哪些方面已经成功掌握,哪些方面仍难以把握。此外,我们整理的数据集可用于开发具有特定语言能力的语言模型。