Supervised machine learning for microbiomics: bridging the gap between current and best practices

Machine learning (ML) is set to accelerate innovations in clinical microbiomics, such as in disease diagnostics and prognostics. This will require high-quality, reproducible, interpretable workflows whose predictive capabilities meet or exceed the high thresholds set for clinical tools by regulatory agencies. Here, we capture a snapshot of current practices in the application of supervised ML to microbiomics data, through an in-depth analysis of 100 peer-reviewed journal articles published in 2021-2022. We apply a data-driven approach to steer discussion of the merits of varied approaches to experimental design, including key considerations such as how to mitigate the effects of small dataset size while avoiding data leakage. We further provide guidance on how to avoid common experimental design pitfalls that can hurt model performance, trustworthiness, and reproducibility. Discussion is accompanied by an interactive online tutorial that demonstrates foundational principles of ML experimental design, tailored to the microbiomics community. Formalizing community best practices for supervised ML in microbiomics is an important step towards improving the success and efficiency of clinical research, to the benefit of patients and other stakeholders.

翻译：机器学习（ML）有望加速临床微生物组学领域的创新，例如在疾病诊断和预后方面。这需要高质量、可重复、可解释的工作流程，其预测能力需达到或超过监管机构为临床工具设定的高门槛。本文通过对2021-2022年间发表的100篇同行评审期刊文章进行深入分析，捕捉了监督式机器学习应用于微生物组学数据的当前实践概况。我们采用数据驱动的方法来探讨不同实验设计方法的优劣，包括关键考量因素，例如如何在避免数据泄露的同时缓解小数据集规模的影响。我们进一步提供指导，说明如何避免可能损害模型性能、可信度和可重复性的常见实验设计缺陷。讨论内容辅以交互式在线教程，该教程展示了机器学习实验设计的基本原理，并针对微生物组学领域进行了定制。为微生物组学中的监督式机器学习制定社区最佳实践，是提高临床研究成功率和效率的重要一步，这将惠及患者及其他利益相关者。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日