Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation - 专知论文

会员服务 ·

0

估计/估计量 · 机器翻译 · 相关系数 · 均值 · Performer ·

2021 年 9 月 22 日

Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation

翻译：推动右按钮:质量估算的反向评估

Diptesh Kanojia,Marina Fomicheva,Tharindu Ranasinghe,Frédéric Blain,Constantin Orăsan,Lucia Specia

from arxiv, Accepted to WMT 2021 Conference co-located with EMNLP 2021. 14 pages with a 4 page appendix

Current Machine Translation (MT) systems achieve very good results on a growing variety of language pairs and datasets. However, they are known to produce fluent translation outputs that can contain important meaning errors, thus undermining their reliability in practice. Quality Estimation (QE) is the task of automatically assessing the performance of MT systems at test time. Thus, in order to be useful, QE systems should be able to detect such errors. However, this ability is yet to be tested in the current evaluation practices, where QE systems are assessed only in terms of their correlation with human judgements. In this work, we bridge this gap by proposing a general methodology for adversarial testing of QE for MT. First, we show that despite a high correlation with human judgements achieved by the recent SOTA, certain types of meaning errors are still problematic for QE to detect. Second, we show that on average, the ability of a given model to discriminate between meaning-preserving and meaning-altering perturbations is predictive of its overall performance, thus potentially allowing for comparing QE systems without relying on manual quality annotation.

翻译：目前的机器翻译系统在越来越多的多种语文配对和数据集上取得了非常好的结果,然而,据知它们能够产生流畅的翻译产出,其中含有重要的含义错误,从而损害其在实践中的可靠性。质量估计(QE)是测试时自动评估MT系统性能的任务。因此,为了有用,QE系统应当能够发现这些错误。然而,在目前的评价做法中,这种能力尚有待测试,因为只有从与人类判断的相关性的角度来评估QE系统。在这项工作中,我们提出对质测试MTQE的一般方法来弥补这一差距。首先,我们表明,尽管与最近的SOTA的人类判断有高度的相关性,但某些类型的意义错误仍然对QE的检测有问题。第二,我们表明,平均而言,某一模式区分保留意义和改变意义之间的差别的能力是对其总体性能的预测,因此有可能允许在不依赖手动质量说明的情况下比较QE系统。

0

相关内容

估计/估计量

估计/估计量

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【电子书】机器学习实战（Machine Learning in Action），附PDF

【电子书】机器学习实战（Machine Learning in Action），附PDF

专知会员服务

132+阅读 · 2019年11月25日

253页通俗易懂最新的机器学习系统入门书籍（Machine-Learning-Systems）（附pdf下载）

253页通俗易懂最新的机器学习系统入门书籍（Machine-Learning-Systems）（附pdf下载）

专知会员服务

77+阅读 · 2019年10月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

已删除

将门创投

4+阅读 · 2018年11月6日

人体姿态估计资源大列表（Human Pose Estimation）

人体姿态估计资源大列表（Human Pose Estimation）

专知

9+阅读 · 2018年10月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Multi-Task Self-Supervised Learning for Disfluency Detection

Arxiv

5+阅读 · 2019年8月15日

Adversarial Metric Attack for Person Re-identification

Adversarial Metric Attack for Person Re-identification

Arxiv

3+阅读 · 2019年1月30日

Context in Neural Machine Translation: A Review of Models and Evaluations

Arxiv

5+阅读 · 2019年1月25日

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

Arxiv

11+阅读 · 2018年12月8日

Evaluating and Understanding the Robustness of Adversarial Logit Pairing

Arxiv

8+阅读 · 2018年7月26日

Are Generative Classifiers More Robust to Adversarial Attacks?

Are Generative Classifiers More Robust to Adversarial Attacks?

Arxiv

4+阅读 · 2018年7月9日

An Improved Evaluation Framework for Generative Adversarial Networks

Arxiv

3+阅读 · 2018年3月27日

Where to put the Image in an Image Caption Generator

Arxiv

3+阅读 · 2018年3月14日

VIP会员

文章信息

相关主题

估计/估计量

最新内容

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

3+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

5+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

6+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

7+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

11+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

10+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

7+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

11+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

7+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

15+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

8+阅读 · 6月17日

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

6+阅读 · 6月17日

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

8+阅读 · 6月17日

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

8+阅读 · 6月17日

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

专知会员服务

10+阅读 · 6月17日

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【电子书】机器学习实战（Machine Learning in Action），附PDF

【电子书】机器学习实战（Machine Learning in Action），附PDF

专知会员服务

132+阅读 · 2019年11月25日

253页通俗易懂最新的机器学习系统入门书籍（Machine-Learning-Systems）（附pdf下载）

253页通俗易懂最新的机器学习系统入门书籍（Machine-Learning-Systems）（附pdf下载）

专知会员服务

77+阅读 · 2019年10月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

相关资讯

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

已删除

将门创投

4+阅读 · 2018年11月6日

人体姿态估计资源大列表（Human Pose Estimation）

人体姿态估计资源大列表（Human Pose Estimation）

专知

9+阅读 · 2018年10月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Multi-Task Self-Supervised Learning for Disfluency Detection

Arxiv

5+阅读 · 2019年8月15日

Adversarial Metric Attack for Person Re-identification

Adversarial Metric Attack for Person Re-identification

Arxiv

3+阅读 · 2019年1月30日

Context in Neural Machine Translation: A Review of Models and Evaluations

Arxiv

5+阅读 · 2019年1月25日

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

Arxiv

11+阅读 · 2018年12月8日

Evaluating and Understanding the Robustness of Adversarial Logit Pairing

Arxiv

8+阅读 · 2018年7月26日

Are Generative Classifiers More Robust to Adversarial Attacks?

Are Generative Classifiers More Robust to Adversarial Attacks?

Arxiv

4+阅读 · 2018年7月9日

An Improved Evaluation Framework for Generative Adversarial Networks

Arxiv

3+阅读 · 2018年3月27日

Where to put the Image in an Image Caption Generator

Arxiv

3+阅读 · 2018年3月14日

微信扫码咨询专知VIP会员