Simile detection is a valuable task for many natural language processing (NLP)-based applications, particularly in the field of literature. However, existing research on simile detection often relies on corpora that are limited in size and do not adequately represent the full range of simile forms. To address this issue, we propose a simile data augmentation method based on \textbf{W}ord replacement And Sentence completion using the GPT-2 language model. Our iterative process called I-WAS, is designed to improve the quality of the augmented sentences. To better evaluate the performance of our method in real-world applications, we have compiled a corpus containing a more diverse set of simile forms for experimentation. Our experimental results demonstrate the effectiveness of our proposed data augmentation method for simile detection.
翻译:明喻检测是许多基于自然语言处理(NLP)的应用中一项有价值的任务,尤其在文学领域。然而,现有明喻检测研究通常依赖规模有限且未能充分涵盖各类明喻形式的语料库。为解决这一问题,我们提出一种基于GPT-2语言模型的\textbf{词}替换与句子补全的明喻数据增强方法。我们设计的迭代过程I-WAS旨在提升增强句子的质量。为在现实应用中更好地评估方法性能,我们构建了一个包含更多样化明喻形式的语料库用于实验。实验结果表明,我们提出的数据增强方法在明喻检测中具有有效性。