Utilising high-dimensional data in randomised clinical trials: a review of methods and practice

Introduction: Even in effectively conducted randomised trials, the probability of a successful study remains relatively low. With recent advances in the next-generation sequencing technologies, there is a rapidly growing number of high-dimensional data, including genetic, molecular and phenotypic information, that have improved our understanding of driver genes, drug targets, and drug mechanisms of action. The leveraging of high-dimensional data holds promise for increased success of clinical trials. Methods: We provide an overview of methods for utilising high-dimensional data in clinical trials. We also investigate the use of these methods in practice through a review of recently published randomised clinical trials that utilise high-dimensional genetic data. The review includes articles that were published between 2019 and 2021, identified through the PubMed database. Results: Out of 174 screened articles, 100 (57.5%) were randomised clinical trials that collected high-dimensional data. The most common clinical area was oncology (30%), followed by chronic diseases (28%), nutrition and ageing (18%) and cardiovascular diseases (7%). The most common types of data analysed were gene expression data (70%), followed by DNA data (21%). The most common method of analysis (36.3%) was univariable analysis. Articles that described multivariable analyses used standard statistical methods. Most of the clinical trials had two arms. Discussion: New methodological approaches are required for more efficient analysis of the increasing amount of high-dimensional data collected in randomised clinical trials. We highlight the limitations and barriers to the current use of high-dimensional data in trials, and suggest potential avenues for improvement and future work.

翻译：摘要：引言：即使在有效实施的随机试验中，研究成功的概率仍然相对较低。随着新一代测序技术的进步，包括遗传、分子和表型信息在内的高维数据迅速增长，这些数据增进了我们对驱动基因、药物靶点及药物作用机制的理解。利用高维数据有望提高临床试验的成功率。方法：我们概述了在临床试验中利用高维数据的方法，并通过对近期发表的使用高维遗传数据的随机临床试验进行综述，探讨了这些方法在实际中的应用。该综述纳入了2019年至2021年间通过PubMed数据库检索到的文章。结果：在174篇筛选的文章中，100篇（57.5%）为收集了高维数据的随机临床试验。最常见的临床领域为肿瘤学（30%），其次是慢性病（28%）、营养与衰老（18%）以及心血管疾病（7%）。最常分析的数据类型为基因表达数据（70%），其次为DNA数据（21%）。最常用的分析方法（36.3%）为单变量分析。描述多变量分析的文章采用了标准统计方法。大多数临床试验包含两个研究组。讨论：针对随机临床试验中收集的日益增多的海量高维数据，需要开发新的方法论以实现更高效的分析。我们指出了当前在试验中利用高维数据的局限性与障碍，并提出了改进及未来研究的潜在方向。