Integrated analysis of multi-omics datasets holds great promise for uncovering complex biological processes. However, the large dimension of omics data poses significant interpretability and multiple testing challenges. Simultaneous Enrichment Analysis (SEA) was introduced to address these issues in single-omics analysis, providing an in-built multiple testing correction and enabling simultaneous feature set testing. In this paper, we introduce OCEAN, an extension of SEA to multi-omics data. OCEAN is a flexible approach to analyze potentially all possible two-way feature sets from any pair of genomics datasets. We also propose two new error rates which are in line with the two-way structure of the data and facilitate interpretation of the results. The power and utility of OCEAN is demonstrated by analyzing copy number and gene expression data for breast and colon cancer.
翻译:多组学数据集的整合分析对于揭示复杂生物过程具有巨大潜力。然而,组学数据的高维度特性带来了显著的可解释性与多重检验挑战。同步富集分析(SEA)的提出旨在解决单组学分析中的这些问题,其内置多重检验校正机制并支持同步特征集检验。本文介绍了OCEAN方法,这是SEA向多组学数据的扩展。OCEAN作为一种灵活的分析框架,能够处理任意两组基因组学数据之间所有可能存在的双向特征集组合。我们进一步提出了两种与数据结构双向特性相匹配的新型错误率指标,以增强结果的可解释性。通过对乳腺癌和结肠癌的拷贝数变异与基因表达数据进行实证分析,验证了OCEAN方法的统计效能与应用价值。