Optimizing the phrasing of argumentative text is crucial in higher education and professional development. However, assessing whether and how the different claims in a text should be revised is a hard task, especially for novice writers. In this work, we explore the main challenges to identifying argumentative claims in need of specific revisions. By learning from collaborative editing behaviors in online debates, we seek to capture implicit revision patterns in order to develop approaches aimed at guiding writers in how to further improve their arguments. We systematically compare the ability of common word embedding models to capture the differences between different versions of the same text, and we analyze their impact on various types of writing issues. To deal with the noisy nature of revision-based corpora, we propose a new sampling strategy based on revision distance. Opposed to approaches from prior work, such sampling can be done without employing additional annotations and judgments. Moreover, we provide evidence that using contextual information and domain knowledge can further improve prediction results. How useful a certain type of context is, depends on the issue the claim is suffering from, though.
翻译:优化议论文的措辞在高等教育和职业发展中至关重要。然而,判断文本中不同论点是否以及如何修改是一项艰巨的任务,尤其对于新手写作者而言。在本研究中,我们探讨了识别需要特定修改的论点性陈述所面临的主要挑战。通过从在线辩论中的协作编辑行为中学习,我们试图捕捉隐式修改模式,从而开发引导写作者进一步改进其论证的方法。我们系统比较了常见词嵌入模型在捕捉同一文本不同版本间差异的能力,并分析了这些差异对各类写作问题的影响。为应对基于修订的语料库固有的噪声问题,我们提出了一种基于修订距离的新采样策略。与先前工作中的方法不同,这种采样无需额外标注和判断即可完成。此外,我们提供了证据表明,利用上下文信息和领域知识可进一步提升预测结果。然而,特定上下文类型的有效性取决于该论点所存在的问题。