Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design - 专知论文

会员服务 ·

0

注释（编程） · 设计 · 关系标注 · 不一致性 · 偏差 ·

2023 年 4 月 3 日

Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design

翻译：众包隐式话语关系设计选择：揭示任务设计引入的偏差

Valentina Pyatkin,Frances Yung,Merel C. J. Scholman,Reut Tsarfaty,Ido Dagan,Vera Demberg

from arxiv, Accepted to TACL, pre-MIT Press publication version

Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias: task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of laymen annotators. For this purpose we look at implicit discourse relation annotation, a task that has repeatedly been shown to be difficult due to the relations' ambiguity. We compare the annotations of 1,200 discourse relations obtained using two distinct annotation tasks and quantify the biases of both methods across four different domains. Both methods are natural language annotation tasks designed for crowdsourcing. We show that the task design can push annotators towards certain relations and that some discourse relations senses can be better elicited with one or the other annotation approach. We also conclude that this type of bias should be taken into account when training and testing models.

翻译：自然语言标注中的分歧大多从标注者和标注框架引入偏差的角度进行研究。本文提出分析另一种偏差来源：任务设计偏差，该偏差对使用自然语言来引导非专业标注者解读的众包语言标注影响尤为显著。为此，我们聚焦于隐式话语关系标注这一因关系歧义性而屡被证实具有难度的任务，比较了通过两种不同标注任务获得的1200条话语关系标注结果，并量化了两种方法在四个不同领域中的偏差。两种方法均为面向众包设计的自然语言标注任务。研究表明，任务设计会促使标注者倾向于某些特定关系，且某些话语关系义项可通过某一标注方法更有效地激发。我们还认为，在训练和测试模型时应当考虑此类偏差。

0

相关内容

注释（编程）

注释（编程）

注释（编程）

ChatAug: 利用ChatGPT进行文本数据增强

ChatAug: 利用ChatGPT进行文本数据增强

专知会员服务

81+阅读 · 2023年3月4日

【KDD2022教程】图算法公平性：方法与趋势，200页ppt

【KDD2022教程】图算法公平性：方法与趋势，200页ppt

专知会员服务

42+阅读 · 2022年8月20日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

专知会员服务

21+阅读 · 2020年6月4日

【ACL2020-Allen AI】预训练语言模型中的无监督域聚类

【ACL2020-Allen AI】预训练语言模型中的无监督域聚类

专知会员服务

24+阅读 · 2020年4月7日

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

专知会员服务

32+阅读 · 2020年2月1日

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

专知会员服务

53+阅读 · 2019年11月22日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACL 2022 | 基于Prompt的自动去偏：有效减轻预训练语言模型中的偏见

ACL 2022 | 基于Prompt的自动去偏：有效减轻预训练语言模型中的偏见

PaperWeekly

0+阅读 · 2022年7月14日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇推荐系统相关论文—影响兴趣、知识Embeddings、音乐推荐、非结构化、一致性、显式和隐式特征、知识图谱

【论文推荐】最新七篇推荐系统相关论文—影响兴趣、知识Embeddings、音乐推荐、非结构化、一致性、显式和隐式特征、知识图谱

专知

14+阅读 · 2018年3月28日

基于聚合的社会化短文本信息处理与细粒度倾向性分析

国家自然科学基金

0+阅读 · 2015年12月31日

社交网络中消费者行为演化及引导机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

遗忘型轻度认知障碍患者内颞叶记忆网络动态变化研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于质量管理的不确定性双向感性工学

国家自然科学基金

0+阅读 · 2014年12月31日

非圆轴承拓扑结构的普遍规律及设计理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Ontology的藏文语料库检索关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

中文词语属性对预视加工影响的眼动和ERP研究

国家自然科学基金

0+阅读 · 2012年12月31日

双极性树枝状蓝光PhOLED用Ir（Ⅲ）金属配合物的合成与性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向不确定性的Web2.0用户创作内容管理研究

国家自然科学基金

0+阅读 · 2011年12月31日

中文语境下基于模糊本体的用户在线评论的情感分析

国家自然科学基金

0+阅读 · 2009年12月31日

Empowering LLM-based Machine Translation with Cultural Awareness

Arxiv

0+阅读 · 2023年5月23日

Debiasing should be Good and Bad: Measuring the Consistency of Debiasing Techniques in Language Models

Arxiv

0+阅读 · 2023年5月23日

Active Learning Principles for In-Context Learning with Large Language Models

Arxiv

0+阅读 · 2023年5月23日

Active Prompting with Chain-of-Thought for Large Language Models

Arxiv

0+阅读 · 2023年5月23日

Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken Language Understanding

Arxiv

0+阅读 · 2023年5月22日

Logical Reasoning for Natural Language Inference Using Generated Facts as Atoms

Arxiv

0+阅读 · 2023年5月22日

Distilling ChatGPT for Explainable Automated Student Answer Assessment

Arxiv

2+阅读 · 2023年5月22日

Evaluating Prompt-based Question Answering for Object Prediction in the Open Research Knowledge Graph

Arxiv

0+阅读 · 2023年5月22日

Fair Allocation in Crowd-Sourced Systems

Arxiv

0+阅读 · 2023年5月22日

Understanding Differences in News Article Interaction Patterns on Facebook: Public vs. Private Sharing with Varying Bias and Reliability

Arxiv

0+阅读 · 2023年5月19日

VIP会员

文章信息

相关主题

注释（编程）

最新内容

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

专知会员服务

0+阅读 · 49分钟前

GNN跨域综述：从消息传递到图基础模型

GNN跨域综述：从消息传递到图基础模型

专知会员服务

0+阅读 · 51分钟前

无人机自主控制与人工智能：系统性综述

无人机自主控制与人工智能：系统性综述

专知会员服务

10+阅读 · 今天7:25

巡飞弹与反无人机系统——现代战场的两大支柱

巡飞弹与反无人机系统——现代战场的两大支柱

专知会员服务

3+阅读 · 今天6:54

《打造“黄金舰队”》57页报告

《打造“黄金舰队”》57页报告

专知会员服务

3+阅读 · 今天6:52

《北约数字教官网络发展路径》128页报告

《北约数字教官网络发展路径》128页报告

专知会员服务

2+阅读 · 今天6:33

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

7+阅读 · 6月25日

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

6+阅读 · 6月25日

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

10+阅读 · 6月25日

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

8+阅读 · 6月25日

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

8+阅读 · 6月25日

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

8+阅读 · 6月25日

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

10+阅读 · 6月25日

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

9+阅读 · 6月25日

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

9+阅读 · 6月25日

相关VIP内容

ChatAug: 利用ChatGPT进行文本数据增强

ChatAug: 利用ChatGPT进行文本数据增强

专知会员服务

81+阅读 · 2023年3月4日

【KDD2022教程】图算法公平性：方法与趋势，200页ppt

【KDD2022教程】图算法公平性：方法与趋势，200页ppt

专知会员服务

42+阅读 · 2022年8月20日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

神经网络与形式语言综述，12页pdf，A Survey of Neural Networks and Formal Languages

专知会员服务

21+阅读 · 2020年6月4日

【ACL2020-Allen AI】预训练语言模型中的无监督域聚类

【ACL2020-Allen AI】预训练语言模型中的无监督域聚类

专知会员服务

24+阅读 · 2020年4月7日

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

专知会员服务

32+阅读 · 2020年2月1日

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

专知会员服务

53+阅读 · 2019年11月22日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

GNN跨域综述：从消息传递到图基础模型

巡飞弹与反无人机系统——现代战场的两大支柱

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

无人机自主控制与人工智能：系统性综述

相关资讯

ACL 2022 | 基于Prompt的自动去偏：有效减轻预训练语言模型中的偏见

ACL 2022 | 基于Prompt的自动去偏：有效减轻预训练语言模型中的偏见

PaperWeekly

0+阅读 · 2022年7月14日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇推荐系统相关论文—影响兴趣、知识Embeddings、音乐推荐、非结构化、一致性、显式和隐式特征、知识图谱

【论文推荐】最新七篇推荐系统相关论文—影响兴趣、知识Embeddings、音乐推荐、非结构化、一致性、显式和隐式特征、知识图谱

专知

14+阅读 · 2018年3月28日

相关论文

Empowering LLM-based Machine Translation with Cultural Awareness

Arxiv

0+阅读 · 2023年5月23日

Debiasing should be Good and Bad: Measuring the Consistency of Debiasing Techniques in Language Models

Arxiv

0+阅读 · 2023年5月23日

Active Learning Principles for In-Context Learning with Large Language Models

Arxiv

0+阅读 · 2023年5月23日

Active Prompting with Chain-of-Thought for Large Language Models

Arxiv

0+阅读 · 2023年5月23日

Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken Language Understanding

Arxiv

0+阅读 · 2023年5月22日

Logical Reasoning for Natural Language Inference Using Generated Facts as Atoms

Arxiv

0+阅读 · 2023年5月22日

Distilling ChatGPT for Explainable Automated Student Answer Assessment

Arxiv

2+阅读 · 2023年5月22日

Evaluating Prompt-based Question Answering for Object Prediction in the Open Research Knowledge Graph

Arxiv

0+阅读 · 2023年5月22日

Fair Allocation in Crowd-Sourced Systems

Arxiv

0+阅读 · 2023年5月22日

Understanding Differences in News Article Interaction Patterns on Facebook: Public vs. Private Sharing with Varying Bias and Reliability

Arxiv

0+阅读 · 2023年5月19日

相关基金

基于聚合的社会化短文本信息处理与细粒度倾向性分析

国家自然科学基金

0+阅读 · 2015年12月31日

社交网络中消费者行为演化及引导机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

遗忘型轻度认知障碍患者内颞叶记忆网络动态变化研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于质量管理的不确定性双向感性工学

国家自然科学基金

0+阅读 · 2014年12月31日

非圆轴承拓扑结构的普遍规律及设计理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Ontology的藏文语料库检索关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

中文词语属性对预视加工影响的眼动和ERP研究

国家自然科学基金

0+阅读 · 2012年12月31日

双极性树枝状蓝光PhOLED用Ir（Ⅲ）金属配合物的合成与性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向不确定性的Web2.0用户创作内容管理研究

国家自然科学基金

0+阅读 · 2011年12月31日

中文语境下基于模糊本体的用户在线评论的情感分析

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员