Since paraphrasing is an ill-defined task, the term "paraphrasing" covers text transformation tasks with different characteristics. Consequently, existing paraphrasing studies have applied quite different (explicit and implicit) criteria as to when a pair of texts is to be considered a paraphrase, all of which amount to postulating a certain level of semantic or lexical similarity. In this paper, we conduct a literature review and propose a taxonomy to organize the 25~identified paraphrasing (sub-)tasks. Using classifiers trained to identify the tasks that a given paraphrasing instance fits, we find that the distributions of task-specific instances in the known paraphrase corpora vary substantially. This means that the use of these corpora, without the respective paraphrase conditions being clearly defined (which is the normal case), must lead to incomparable and misleading results.
翻译:由于释义是一项定义不明确的任务,"释义"一词涵盖了具有不同特征的文本转换任务。因此,现有的释义研究对于何时将一对文本视为释义应用了相当不同的(显性和隐性)标准,所有这些标准都相当于假设了某种程度的语义或词汇相似性。在本文中,我们进行了文献综述,并提出了一种分类法来组织已识别的25种释义(子)任务。使用经过训练的分类器来识别给定释义实例所适合的任务,我们发现已知释义语料库中任务特定实例的分布差异很大。这意味着,在没有明确定义相应释义条件(这是常见情况)的情况下使用这些语料库,必然会导致不可比较且具有误导性的结果。