Data Augmentation for Conflict and Duplicate Detection in Software Engineering Sentence Pairs - 专知论文

会员服务 ·

0

数据增强 · Engineering · Pair · 置换 · 可辨认的 ·

2023 年 5 月 16 日

Data Augmentation for Conflict and Duplicate Detection in Software Engineering Sentence Pairs

翻译：面向软件工程句子对中冲突与重复检测的数据增强技术

Garima Malik,Mucahit Cevik,Ayşe Başar

from arxiv, 10 pages ACM conference

This paper explores the use of text data augmentation techniques to enhance conflict and duplicate detection in software engineering tasks through sentence pair classification. The study adapts generic augmentation techniques such as shuffling, back translation, and paraphrasing and proposes new data augmentation techniques such as Noun-Verb Substitution, target-lemma replacement and Actor-Action Substitution for software requirement texts. A comprehensive empirical analysis is conducted on six software text datasets to identify conflicts and duplicates among sentence pairs. The results demonstrate that data augmentation techniques have a significant impact on the performance of all software pair text datasets. On the other hand, in cases where the datasets are relatively balanced, the use of augmentation techniques may result in a negative effect on the classification performance.

翻译：本文探索了通过句子对分类任务，运用文本数据增强技术提升软件工程任务中冲突与重复检测效果的方法。研究将通用增强技术（如洗牌、反向翻译和释义）进行适配，并提出了针对软件需求文本的新增强技术，包括名词-动词替换、目标词元替换及动作主体替换。我们对六个软件文本数据集进行全面的实证分析，以识别句子对中的冲突与重复。结果表明，数据增强技术对所有软件对文本数据集的性能具有显著影响。另一方面，在数据集相对平衡的情况下，使用增强技术可能会对分类性能产生负面影响。

0

相关内容

数据增强

数据增强在机器学习领域多指采用一些方法（比如数据蒸馏，正负样本均衡等）来提高模型数据集的质量，增强数据。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

汶川地震灾区克山病大骨节病病区异地重建人群发病的生态比较研究

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Egr3调控造血干细胞功能的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

内皮细胞TRPV4-SKCa3耦联稳态失调在高血压血管功能稳态失调中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

三类非椭圆代数曲线上的密码特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

大脑皮质抑制性神经环路的发育及其基因调控

国家自然科学基金

0+阅读 · 2012年12月31日

纯碱法清洁、高效综合利用硼泥有价元素过程的反应机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

MicroRNA-101调控CRMP4表达抑制前列腺癌转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

SPARC在强直性脊柱炎发病中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

Probabilistic Linguistic Knowledge and Token-level Text Augmentation

Arxiv

0+阅读 · 2023年7月3日

Testing Causality in Scientific Modelling Software

Arxiv

0+阅读 · 2023年6月30日

A Comparison of Reinforcement Learning Frameworks for Software Testing Tasks

Arxiv

0+阅读 · 2023年6月29日

Effectiveness of Data Augmentation for Parameter Efficient Tuning with Limited Data

Arxiv

0+阅读 · 2023年6月29日

Multi-source Semantic Graph-based Multimodal Sarcasm Explanation Generation

Arxiv

0+阅读 · 2023年6月29日

A negation detection assessment of GPTs: analysis with the xNot360 dataset

Arxiv

0+阅读 · 2023年6月29日

Generalized Out-of-Distribution Detection: A Survey

Generalized Out-of-Distribution Detection: A Survey

Arxiv

15+阅读 · 2021年10月21日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

Augmentation for small object detection

Augmentation for small object detection

Arxiv

13+阅读 · 2019年2月19日

Deep Learning for Generic Object Detection: A Survey

Deep Learning for Generic Object Detection: A Survey

Arxiv

14+阅读 · 2018年9月6日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

1+阅读 · 今天14:45

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

1+阅读 · 今天14:43

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

3+阅读 · 今天14:31

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

3+阅读 · 今天14:20

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

2+阅读 · 今天14:11

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

3+阅读 · 今天14:07

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

3+阅读 · 今天14:03

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

专知会员服务

2+阅读 · 今天13:59

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

5+阅读 · 6月22日

综述 | 3D场景图：开放挑战与未来方向

综述 | 3D场景图：开放挑战与未来方向

专知会员服务

8+阅读 · 6月22日

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

专知会员服务

7+阅读 · 6月22日

21世纪的无人机战争

21世纪的无人机战争

专知会员服务

4+阅读 · 6月22日

《伊朗与以色列-美国热战及其对数字技术的影响》

《伊朗与以色列-美国热战及其对数字技术的影响》

专知会员服务

5+阅读 · 6月22日

《量子技术的军事任务技术适配与利用》

《量子技术的军事任务技术适配与利用》

专知会员服务

5+阅读 · 6月22日

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

专知会员服务

8+阅读 · 6月22日

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 世界动作模型：少做梦，多行动

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

美以伊冲突：无人机与人工智能的运用

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Probabilistic Linguistic Knowledge and Token-level Text Augmentation

Arxiv

0+阅读 · 2023年7月3日

Testing Causality in Scientific Modelling Software

Arxiv

0+阅读 · 2023年6月30日

A Comparison of Reinforcement Learning Frameworks for Software Testing Tasks

Arxiv

0+阅读 · 2023年6月29日

Effectiveness of Data Augmentation for Parameter Efficient Tuning with Limited Data

Arxiv

0+阅读 · 2023年6月29日

Multi-source Semantic Graph-based Multimodal Sarcasm Explanation Generation

Arxiv

0+阅读 · 2023年6月29日

A negation detection assessment of GPTs: analysis with the xNot360 dataset

Arxiv

0+阅读 · 2023年6月29日

Generalized Out-of-Distribution Detection: A Survey

Generalized Out-of-Distribution Detection: A Survey

Arxiv

15+阅读 · 2021年10月21日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

Augmentation for small object detection

Augmentation for small object detection

Arxiv

13+阅读 · 2019年2月19日

Deep Learning for Generic Object Detection: A Survey

Deep Learning for Generic Object Detection: A Survey

Arxiv

14+阅读 · 2018年9月6日

相关基金

汶川地震灾区克山病大骨节病病区异地重建人群发病的生态比较研究

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Egr3调控造血干细胞功能的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

内皮细胞TRPV4-SKCa3耦联稳态失调在高血压血管功能稳态失调中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

三类非椭圆代数曲线上的密码特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

大脑皮质抑制性神经环路的发育及其基因调控

国家自然科学基金

0+阅读 · 2012年12月31日

纯碱法清洁、高效综合利用硼泥有价元素过程的反应机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

MicroRNA-101调控CRMP4表达抑制前列腺癌转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

SPARC在强直性脊柱炎发病中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员