A Perspective on Crowdsourcing and Human-in-the-Loop Workflows in Precision Health

Modern machine learning approaches have led to performant diagnostic models for a variety of health conditions. Several machine learning approaches, such as decision trees and deep neural networks, can, in principle, approximate any function. However, this power can be considered to be both a gift and a curse, as the propensity toward overfitting is magnified when the input data are heterogeneous and high dimensional and the output class is highly nonlinear. This issue can especially plague diagnostic systems that predict behavioral and psychiatric conditions that are diagnosed with subjective criteria. An emerging solution to this issue is crowdsourcing, where crowd workers are paid to annotate complex behavioral features in return for monetary compensation or a gamified experience. These labels can then be used to derive a diagnosis, either directly or by using the labels as inputs to a diagnostic machine learning model. This viewpoint describes existing work in this emerging field and discusses ongoing challenges and opportunities with crowd-powered diagnostic systems, a nascent field of study. With the correct considerations, the addition of crowdsourcing to human-in-the-loop machine learning workflows for the prediction of complex and nuanced health conditions can accelerate screening, diagnostics, and ultimately access to care.

翻译：现代机器学习方法已为多种健康状况开发出高性能诊断模型。决策树与深度神经网络等若干机器学习方法，原则上能够逼近任意函数。然而这种能力可谓双刃剑：当输入数据具有高维度异质性且输出类别呈现高度非线性时，模型过拟合倾向会被显著放大。该问题尤其困扰那些基于主观标准诊断行为与精神类疾病的预测系统。新兴解决方案是采用众包模式，通过经济报酬或游戏化体验激励众包工作者标注复杂行为特征。这些标注可直接用于诊断推导，亦可作为诊断机器学习模型的输入数据。本文阐述这一新兴领域的现有研究成果，并探讨众包驱动诊断系统——这个处于萌芽阶段的研究领域——所面临的持续挑战与发展机遇。通过合理设计，将众包机制融入复杂微妙健康状况预测的人机协同机器学习工作流程，有望加速筛查诊断进程，并最终提升医疗服务的可及性。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日