The Challenges of Machine Learning for Trust and Safety: A Case Study on Misinformation Detection

We examine the disconnect between scholarship and practice in applying machine learning to trust and safety problems, using misinformation detection as a case study. We survey literature on automated detection of misinformation across a corpus of 248 well-cited papers in the field. We then examine subsets of papers for data and code availability, design missteps, reproducibility, and generalizability. Our paper corpus includes published work in security, natural language processing, and computational social science. Across these disparate disciplines, we identify common errors in dataset and method design. In general, detection tasks are often meaningfully distinct from the challenges that online services actually face. Datasets and model evaluation are often non-representative of real-world contexts, and evaluation frequently is not independent of model training. We demonstrate the limitations of current detection methods in a series of three representative replication studies. Based on the results of these analyses and our literature survey, we conclude that the current state-of-the-art in fully-automated misinformation detection has limited efficacy in detecting human-generated misinformation. We offer recommendations for evaluating applications of machine learning to trust and safety problems and recommend future directions for research.

翻译：本文以虚假信息检测为案例，探讨机器学习应用于信任与安全问题时学术界与实践界的脱节现象。我们系统综述了该领域248篇高被引文献中关于虚假信息自动检测的研究，并针对部分论文的数据与代码可获取性、设计缺陷、可复现性及泛化能力进行了深入分析。论文样本涵盖安全、自然语言处理与计算社会科学领域的已发表成果。跨学科分析显示，数据集构建与研究方法存在普遍性错误：检测任务的设计常与在线平台实际面临的挑战存在本质差异；数据集与模型评估往往无法反映真实场景，且评估过程常未独立于模型训练阶段。我们通过三项代表性复现研究揭示了当前检测方法的局限性。基于文献综述与实证分析结果，我们认为当前全自动虚假信息检测技术对人工生成虚假信息的识别效能有限。最后，我们提出机器学习在信任与安全领域应用的评估建议，并指明未来研究方向。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日