Current biological AI models lack interpretability -- their internal representations do not correspond to biological relationships that researchers can examine. Here we present CDT-II, an "AI microscope" whose attention maps are directly interpretable as regulatory structure. By mirroring the central dogma in its architecture, each attention mechanism corresponds to a specific biological relationship: DNA self-attention for genomic relationships, RNA self-attention for gene co-regulation, and DNA-to-RNA cross-attention for transcriptional control. Using only genomic embeddings and raw per-cell expression, CDT-II enables experimental biologists to observe regulatory networks in their own data. Applied to K562 CRISPRi data, CDT-II predicts perturbation effects (per-gene mean $r = 0.84$) and recovers the GFI1B regulatory network without supervision (6.6-fold enrichment, $P = 3.5 \times 10^{-17}$). Two distinct attention mechanisms converge on an RNA processing module ($P = 1 \times 10^{-16}$). CDT-II establishes mechanism-oriented AI as an alternative to task-oriented approaches, revealing regulatory structure rather than merely optimizing predictions.
翻译:当前生物学人工智能模型缺乏可解释性——其内部表征与研究者可检验的生物学关系不相符。本文提出CDT-II,一种"人工智能显微镜",其注意力图谱可直接解释为调控结构。通过在其架构中映射中心法则,每个注意力机制对应特定的生物学关系:DNA自注意力对应基因组关系,RNA自注意力对应基因共调控,DNA到RNA交叉注意力对应转录控制。仅使用基因组嵌入和原始单细胞表达数据,CDT-II使实验生物学家能够在其自身数据中观察调控网络。应用于K562 CRISPRi数据时,CDT-II预测了扰动效应(单基因平均$r = 0.84$)并在无监督条件下重建了GFI1B调控网络(6.6倍富集,$P = 3.5 \times 10^{-17}$)。两种不同的注意力机制共同识别出RNA加工模块($P = 1 \times 10^{-16}$)。CDT-II确立了以机制为导向的人工智能作为以任务为导向方法的替代方案,揭示了调控结构而非仅仅优化预测。