In this paper, we propose SCANING, an unsupervised framework for paraphrasing via controlled noise injection. We focus on the novel task of paraphrasing algebraic word problems having practical applications in online pedagogy as a means to reduce plagiarism as well as ensure understanding on the part of the student instead of rote memorization. This task is more complex than paraphrasing general-domain corpora due to the difficulty in preserving critical information for solution consistency of the paraphrased word problem, managing the increased length of the text and ensuring diversity in the generated paraphrase. Existing approaches fail to demonstrate adequate performance on at least one, if not all, of these facets, necessitating the need for a more comprehensive solution. To this end, we model the noising search space as a composition of contextual and syntactic aspects and sample noising functions consisting of either one or both aspects. This allows for learning a denoising function that operates over both aspects and produces semantically equivalent and syntactically diverse outputs through grounded noise injection. The denoising function serves as a foundation for learning a paraphrasing function which operates solely in the input-paraphrase space without carrying any direct dependency on noise. We demonstrate SCANING considerably improves performance in terms of both semantic preservation and producing diverse paraphrases through extensive automated and manual evaluation across 4 datasets.
翻译:本文提出 SCANING,一种通过受控噪声注入进行无监督 paraphrase 生成的框架。我们聚焦于代数文字题 paraphrase 这一新任务,该任务在在线教学中具有实际应用,既能减少抄袭,也能确保学生真正理解而非死记硬背。由于需在保持 paraphrase 后问题求解一致性所需的关键信息、处理更长的文本长度、以及确保生成 paraphrase 的多样性等方面存在困难,该任务比通用领域 paraphrase 更为复杂。现有方法至少在这些方面之一(甚至全部)表现不足,因此需要更全面的解决方案。为此,我们将噪声搜索空间建模为上下文与句法两方面因素的组合,并采样包含其中一方面或两方面因素的噪声函数。这使我们能够学习一个去噪函数,该函数同时作用于两方面,并通过基于实例的噪声注入生成语义等价、句法多样的输出。去噪函数可作为学习 paraphrase 函数的基础,该函数仅在输入与 paraphrase 空间内运作,不直接依赖噪声。通过在四个数据集上的广泛自动与人工评估,我们证明 SCANING 在语义保持和生成多样 paraphrase 方面均显著提升了性能。