Current approaches in paraphrase generation and detection heavily rely on a single general similarity score, ignoring the intricate linguistic properties of language. This paper introduces two new tasks to address this shortcoming by considering paraphrase types - specific linguistic perturbations at particular text positions. We name these tasks Paraphrase Type Generation and Paraphrase Type Detection. Our results suggest that while current techniques perform well in a binary classification scenario, i.e., paraphrased or not, the inclusion of fine-grained paraphrase types poses a significant challenge. While most approaches are good at generating and detecting general semantic similar content, they fail to understand the intrinsic linguistic variables they manipulate. Models trained in generating and identifying paraphrase types also show improvements in tasks without them. In addition, scaling these models further improves their ability to understand paraphrase types. We believe paraphrase types can unlock a new paradigm for developing paraphrase models and solving tasks in the future.
翻译:当前的释义生成与检测方法高度依赖单一的通用相似性得分,忽略了语言中复杂的语言学属性。本文通过引入两类新任务来解决这一不足,这些任务考虑了特定文本位置上的细粒度语言扰动——即释义类型。我们将这些任务命名为“释义类型生成”与“释义类型检测”。研究结果表明:尽管现有技术在二分类场景(即判断是否属于释义)中表现良好,但引入细粒度的释义类型带来了显著挑战;大多数方法虽擅长生成和检测通用语义相似内容,却未能理解它们所操作的内在语言变量。经过释义类型生成与识别训练的模型,即使在没有这些任务的情境下也展现出性能提升。此外,扩大模型规模能进一步增强其对释义类型的理解能力。我们认为,释义类型有望开创释义模型开发与未来任务求解的新范式。