Given the increasing volume and quality of genomics data, extracting new insights requires interpretable machine-learning models. This work presents Genomic Interpreter: a novel architecture for genomic assay prediction. This model outperforms the state-of-the-art models for genomic assay prediction tasks. Our model can identify hierarchical dependencies in genomic sites. This is achieved through the integration of 1D-Swin, a novel Transformer-based block designed by us for modelling long-range hierarchical data. Evaluated on a dataset containing 38,171 DNA segments of 17K base pairs, Genomic Interpreter demonstrates superior performance in chromatin accessibility and gene expression prediction and unmasks the underlying `syntax' of gene regulation.
翻译:随着基因组学数据量与质量的持续增长,提取新洞察需要可解释的机器学习模型。本文提出基因组解析器:一种用于基因组检测预测的新型架构。该模型在基因组检测预测任务上超越了当前最先进的模型。我们的模型能够识别基因组位点中的分层依赖关系,这是通过集成1D-Swin(一种我们设计的基于Transformer的新型模块,用于建模长程分层数据)实现的。在包含38,171个长度为17,000碱基对的DNA片段数据集上进行评估,基因组解析器在染色质可及性与基因表达预测方面展现了卓越性能,并揭示了基因调控的潜在"语法"。