Addressing the challenge of low-resource information extraction remains an ongoing issue due to the inherent information scarcity within limited training examples. Existing data augmentation methods, considered potential solutions, struggle to strike a balance between weak augmentation (e.g., synonym augmentation) and drastic augmentation (e.g., conditional generation without proper guidance). This paper introduces a novel paradigm that employs targeted augmentation and back validation to produce augmented examples with enhanced diversity, polarity, accuracy, and coherence. Extensive experimental results demonstrate the effectiveness of the proposed paradigm. Furthermore, identified limitations are discussed, shedding light on areas for future improvement.
翻译:应对低资源信息提取的挑战仍然是一个持续存在的问题,原因是有限训练样本中固有的信息稀缺性。现有数据增强方法虽被视为潜在解决方案,但难以在弱增强(如同义词增强)与剧烈增强(如无适当引导的条件生成)之间取得平衡。本文提出了一种新范式,采用定向增强与反向验证来生成具有更优多样性、极性、准确性与连贯性的增强样本。大量实验结果表明了该范式的有效性。此外,本文还讨论了已识别的局限性,为未来改进方向提供了启示。