Gene Regulatory Network Inference (GRNI) aims to identify causal relationships among genes using gene expression data, providing insights into regulatory mechanisms. A significant yet often overlooked challenge is selection bias, a process where only cells meeting specific criteria, such as gene expression thresholds, survive or are observed, distorting the true joint distribution of genes and thus biasing GRNI results. Furthermore, gene expression is influenced by latent confounders, such as non-coding RNAs, which add complexity to GRNI. To address these challenges, we propose GISL (Gene Regulatory Network Inference in the presence of Selection bias and Latent confounders), a novel algorithm to infer true regulatory relationships in the presence of selection and confounding issues. Leveraging data obtained via multiple gene perturbation experiments, we show that the true regulatory relationships, as well as selection processes and latent confounders can be partially identified without strong parametric models and under mild graphical assumptions. Experimental results on both synthetic and real-world single-cell gene expression datasets demonstrate the superiority of GISL over existing methods.
翻译:基因调控网络推断旨在利用基因表达数据识别基因间的因果关系,从而揭示调控机制。一个重要但常被忽视的挑战是选择偏差,即只有满足特定条件(如基因表达阈值)的细胞存活或被观测到,这一过程会扭曲基因的真实联合分布,进而导致推断结果产生偏差。此外,基因表达还受到潜在混杂因素(如非编码RNA)的影响,这进一步增加了基因调控网络推断的复杂性。为应对这些挑战,我们提出了GISL(存在选择偏差与潜在混杂因素下的基因调控网络推断),这是一种在存在选择与混杂问题的情况下推断真实调控关系的新算法。通过利用多组基因扰动实验获得的数据,我们证明,无需强参数化模型且在温和的图结构假设下,真实的调控关系、选择过程及潜在混杂因素均可被部分识别。在合成与真实单细胞基因表达数据集上的实验结果表明,GISL优于现有方法。