Reproducibility is Nothing without Correctness: The Importance of Testing Code in NLP - 专知论文

会员服务 ·

0

正确性 · 重现性 · 潜在 · 程序正确性 · Conformer ·

2023 年 3 月 31 日

Reproducibility is Nothing without Correctness: The Importance of Testing Code in NLP

翻译：可重复性若无正确性则无意义：自然语言处理中代码测试的重要性

Sara Papi,Marco Gaido,Andrea Pilzer,Matteo Negri

Despite its pivotal role in research experiments, code correctness is often presumed only on the basis of the perceived quality of the results. This comes with the risk of erroneous outcomes and potentially misleading findings. To address this issue, we posit that the current focus on result reproducibility should go hand in hand with the emphasis on coding best practices. We bolster our call to the NLP community by presenting a case study, in which we identify (and correct) three bugs in widely used open-source implementations of the state-of-the-art Conformer architecture. Through comparative experiments on automatic speech recognition and translation in various language settings, we demonstrate that the existence of bugs does not prevent the achievement of good and reproducible results and can lead to incorrect conclusions that potentially misguide future research. In response to this, this study is a call to action toward the adoption of coding best practices aimed at fostering correctness and improving the quality of the developed software.

翻译：尽管代码正确性在研究实验中扮演着关键角色，但人们往往仅依据结果的可感知质量来假定其正确性。这带来了产生错误结果和潜在误导性发现的风险。为解决这一问题，我们认为当前对结果可重复性的关注应与对编码最佳实践的重视齐头并进。通过一个案例研究，我们在广泛使用的开源实现（基于最先进的Conformer架构）中识别并纠正了三个错误，以此向自然语言处理学界发出呼吁。通过在多种语言场景下的自动语音识别和翻译对比实验，我们证明：代码中存在错误并不妨碍获得良好且可重复的结果，但可能导致错误结论，从而潜在误导后续研究。因此，本研究旨在呼吁学界采纳编码最佳实践，以促进代码正确性并提升所开发软件的质量。

0

相关内容

正确性

【KDD2022教程】图算法公平性：方法与趋势，200页ppt

【KDD2022教程】图算法公平性：方法与趋势，200页ppt

专知会员服务

42+阅读 · 2022年8月20日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

毛冬青PA类五环三萜选择性拮抗ADP的机制和构效研究

国家自然科学基金

0+阅读 · 2015年12月31日

IL-35在动脉粥样硬化进程中的作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

外包数据的密文存储及查询的关键技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

IMPDH为靶点的小分子抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2012年12月31日

微纳环结构及器件的量子输运与磁响应机制

国家自然科学基金

0+阅读 · 2012年12月31日

极薄金属膜层在冲击压力下的瞬态光反射透射特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

木霉诱导下杨树ARF转录因子对其生长及抗病的分子调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

可扩展的高效XML数据管理关键技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

污染土壤重金属生物有效性与蚯蚓生态毒性效应关系研究

国家自然科学基金

0+阅读 · 2010年12月31日

新型尿苷肽类抗结核抗生素Sansanmycin的结构修饰与构效关系研究

国家自然科学基金

0+阅读 · 2008年12月31日

TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models

Arxiv

0+阅读 · 2023年5月18日

Deanthropomorphising NLP: Can a Language Model Be Conscious?

Arxiv

0+阅读 · 2023年5月18日

Segment Anything Model for Medical Images?

Arxiv

1+阅读 · 2023年5月18日

Think Outside the Code: Brainstorming Boosts Large Language Models in Code Generation

Arxiv

0+阅读 · 2023年5月18日

Explaining black box text modules in natural language with language models

Arxiv

0+阅读 · 2023年5月17日

PED-ANOVA: Efficiently Quantifying Hyperparameter Importance in Arbitrary Subspaces

Arxiv

0+阅读 · 2023年5月16日

What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning

Arxiv

0+阅读 · 2023年5月16日

Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

Arxiv

21+阅读 · 2021年9月2日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

37+阅读 · 2021年8月2日

Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond

Arxiv

15+阅读 · 2020年5月13日

VIP会员

文章信息

相关主题

程序正确性

最新内容

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

4+阅读 · 6月22日

综述 | 3D场景图：开放挑战与未来方向

综述 | 3D场景图：开放挑战与未来方向

专知会员服务

6+阅读 · 6月22日

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

专知会员服务

6+阅读 · 6月22日

21世纪的无人机战争

21世纪的无人机战争

专知会员服务

4+阅读 · 6月22日

《伊朗与以色列-美国热战及其对数字技术的影响》

《伊朗与以色列-美国热战及其对数字技术的影响》

专知会员服务

5+阅读 · 6月22日

《量子技术的军事任务技术适配与利用》

《量子技术的军事任务技术适配与利用》

专知会员服务

5+阅读 · 6月22日

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

专知会员服务

6+阅读 · 6月22日

美国从乌克兰无人机战争中学习经验

美国从乌克兰无人机战争中学习经验

专知会员服务

7+阅读 · 6月21日

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

专知会员服务

5+阅读 · 6月21日

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

专知会员服务

8+阅读 · 6月21日

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

专知会员服务

22+阅读 · 6月20日

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

5+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

8+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

7+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

9+阅读 · 6月18日

相关VIP内容

【KDD2022教程】图算法公平性：方法与趋势，200页ppt

【KDD2022教程】图算法公平性：方法与趋势，200页ppt

专知会员服务

42+阅读 · 2022年8月20日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 3D场景图：开放挑战与未来方向

21世纪的无人机战争

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

相关论文

TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models

Arxiv

0+阅读 · 2023年5月18日

Deanthropomorphising NLP: Can a Language Model Be Conscious?

Arxiv

0+阅读 · 2023年5月18日

Segment Anything Model for Medical Images?

Arxiv

1+阅读 · 2023年5月18日

Think Outside the Code: Brainstorming Boosts Large Language Models in Code Generation

Arxiv

0+阅读 · 2023年5月18日

Explaining black box text modules in natural language with language models

Arxiv

0+阅读 · 2023年5月17日

PED-ANOVA: Efficiently Quantifying Hyperparameter Importance in Arbitrary Subspaces

Arxiv

0+阅读 · 2023年5月16日

What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning

Arxiv

0+阅读 · 2023年5月16日

Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

Arxiv

21+阅读 · 2021年9月2日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

37+阅读 · 2021年8月2日

Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond

Arxiv

15+阅读 · 2020年5月13日

相关基金

毛冬青PA类五环三萜选择性拮抗ADP的机制和构效研究

国家自然科学基金

0+阅读 · 2015年12月31日

IL-35在动脉粥样硬化进程中的作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

外包数据的密文存储及查询的关键技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

IMPDH为靶点的小分子抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2012年12月31日

微纳环结构及器件的量子输运与磁响应机制

国家自然科学基金

0+阅读 · 2012年12月31日

极薄金属膜层在冲击压力下的瞬态光反射透射特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

木霉诱导下杨树ARF转录因子对其生长及抗病的分子调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

可扩展的高效XML数据管理关键技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

污染土壤重金属生物有效性与蚯蚓生态毒性效应关系研究

国家自然科学基金

0+阅读 · 2010年12月31日

新型尿苷肽类抗结核抗生素Sansanmycin的结构修饰与构效关系研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员