The Challenges of Machine Learning for Trust and Safety: A Case Study on Misinformation Detection

We examine the disconnect between scholarship and practice in applying machine learning to trust and safety problems, using misinformation detection as a case study. We systematize literature on automated detection of misinformation across a corpus of 270 well-cited papers in the field. We then examine subsets of papers for data and code availability, design missteps, reproducibility, and generalizability. Our paper corpus includes published work in security, natural language processing, and computational social science. Across these disparate disciplines, we identify common errors in dataset and method design. In general, detection tasks are often meaningfully distinct from the challenges that online services actually face. Datasets and model evaluation are often non-representative of real-world contexts, and evaluation frequently is not independent of model training. Data and code availability is poor. We demonstrate the limitations of current detection methods in a series of three replication studies. Based on the results of these analyses and our literature survey, we offer recommendations for evaluating applications of machine learning to trust and safety problems in general. Our aim is for future work to avoid the pitfalls that we identify.

翻译：我们以虚假信息检测为案例，审视了机器学习在信任与安全问题的学术研究与实践应用之间的脱节。我们对270篇该领域高引用论文构成的文献库中关于虚假信息自动检测的研究进行了系统化梳理。随后，我们考察了部分论文在数据与代码可用性、设计缺陷、可重复性和泛化能力方面的表现。我们的论文库涵盖安全、自然语言处理及计算社会科学等领域已发表的研究成果。在这些不同学科中，我们识别出数据集与方法设计中的常见错误。总体而言，检测任务通常与在线服务实际面临的挑战存在显著差异。数据集与模型评估往往缺乏对真实世界情境的代表性，且评估过程常未独立于模型训练。数据和代码的可用性较差。我们通过三项重复性研究，展示了当前检测方法的局限性。基于这些分析结果及文献调研，我们为评估机器学习在信任与安全问题上的应用提出了通用建议。旨在使未来研究能够规避我们已识别的陷阱。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日