Current biological AI models lack interpretability -- their internal representations do not correspond to biological relationships that researchers can examine. Here we present CDT-II, an "AI microscope" whose attention maps are directly interpretable as regulatory structure. By mirroring the central dogma in its architecture, CDT-II ensures that each attention mechanism corresponds to a specific biological relationship: DNA self-attention for genomic relationships, RNA self-attention for gene co-regulation, and DNA-to-RNA cross-attention for transcriptional control. Using only genomic embeddings and raw per-cell expression, CDT-II enables experimental biologists to observe regulatory networks in their own data. Applied to K562 CRISPRi data, CDT-II predicts perturbation effects (per-gene mean $r = 0.84$) and recovers the GFI1B regulatory network without supervision (6.6-fold enrichment, $P = 3.5 \times 10^{-17}$). Systematic comparison against ENCODE K562 regulatory annotations reveals that cross-attention autonomously focuses on known regulatory elements -- DNase hypersensitive sites ($201\times$ enrichment), CTCF binding sites ($28\times$), and histone marks -- across all five held-out genes. Two distinct attention mechanisms independently identify an overlapping RNA processing module (80% gene overlap; RNA binding enrichment $P = 1 \times 10^{-16}$). CDT-II establishes mechanism-oriented AI as an alternative to task-oriented approaches, revealing regulatory structure rather than merely optimizing predictions.
翻译:当前生物学AI模型缺乏可解释性——其内部表征与研究者可检验的生物学关系不相符。本文提出CDT-II,一种注意力图谱可直接解读为调控结构的"AI显微镜"。通过在其架构中映射中心法则,CDT-II确保每个注意力机制对应特定的生物学关系:DNA自注意力对应基因组关系,RNA自注意力对应基因共调控,DNA到RNA交叉注意力对应转录调控。仅使用基因组嵌入和原始单细胞表达数据,CDT-II使实验生物学家能在自身数据中观察调控网络。应用于K562 CRISPRi数据时,CDT-II预测扰动效应(单基因平均$r = 0.84$)并在无监督条件下重建GFI1B调控网络(6.6倍富集,$P = 3.5 \times 10^{-17}$)。与ENCODE K562调控注释的系统比较表明,交叉注意力自主聚焦于已知调控元件——DNase超敏感位点($201\times$富集)、CTCF结合位点($28\times$)和组蛋白标记——在所有五个保留验证基因中均成立。两种不同的注意力机制独立识别出重叠的RNA加工模块(80%基因重叠;RNA结合富集$P = 1 \times 10^{-16}$)。CDT-II确立了机制导向AI作为任务导向方法的替代方案,其揭示的是调控结构而非仅仅优化预测。