Utilising high-dimensional data in randomised clinical trials: a review of methods and practice

Introduction Even in effectively conducted randomised trials, the probability of a successful study remains relatively low. With recent advances in the next-generation sequencing technologies, there is a rapidly growing number of high-dimensional data, including genetic, molecular and phenotypic information, that have improved our understanding of driver genes, drug targets, and drug mechanisms of action. The leveraging of high-dimensional data holds promise for increased success of clinical trials. Methods We provide an overview of methods for utilising high-dimensional data in clinical trials. We also investigate the use of these methods in practice through a review of recently published randomised clinical trials that utilise high-dimensional genetic data. The review includes articles that were published between 2019 and 2021, identified through the PubMed database. Results Out of 174 screened articles, 100 (57.5%) were randomised clinical trials that collected high-dimensional data. The most common clinical area was oncology (30%), followed by chronic diseases (28%), nutrition and ageing (18%) and cardiovascular diseases (7%). The most common types of data analysed were gene expression data (70%), followed by DNA data (21%). The most common method of analysis (36.3%) was univariable analysis. Articles that described multivariable analyses used standard statistical methods. Most of the clinical trials had two arms. Discussion New methodological approaches are required for more efficient analysis of the increasing amount of high-dimensional data collected in randomised clinical trials. We highlight the limitations and barriers to the current use of high-dimensional data in trials, and suggest potential avenues for improvement and future work.

翻译：摘要引言即使在设计严谨的随机试验中，研究成功的概率仍然相对较低。随着新一代测序技术的进步，包括遗传、分子和表型信息在内的高维数据迅速增加，这些数据深化了我们对驱动基因、药物靶点及药物作用机制的理解。利用高维数据有望提升临床试验的成功率。方法我们概述了在临床试验中利用高维数据的方法，并通过回顾近期发表的使用高维遗传数据的随机临床试验，考察了这些方法在实践中的应用。本综述纳入2019年至2021年间通过PubMed数据库检索的文章。结果在筛选的174篇文章中，100篇（57.5%）为随机临床试验，且均采集了高维数据。最常见的临床领域是肿瘤学（30%），其次是慢性疾病（28%）、营养与衰老（18%）以及心血管疾病（7%）。最常分析的数据类型为基因表达数据（70%），其次为DNA数据（21%）。最常用的分析方法（36.3%）是单变量分析。采用多变量分析的文章均使用标准统计方法。大多数临床试验包含两个试验组。讨论针对随机临床试验中不断增长的高维数据采集量，需要开发新的方法论以实现更高效的分析。我们指出了当前在试验中利用高维数据的局限性与障碍，并提出了改进及未来研究的潜在方向。