In medical genetics, each genetic variant is evaluated as an independent entity regarding its clinical importance. However, in most complex diseases, variant combinations in specific gene networks, rather than the presence of a particular single variant, predominates. In the case of complex diseases, disease status can be evaluated by considering the success level of a team of specific variants. We propose a high dimensional modelling based method to analyse all the variants in a gene network together. To evaluate our method, we selected two gene networks, mTOR and TGF-Beta. For each pathway, we generated 400 control and 400 patient group samples. mTOR and TGF-? pathways contain 31 and 93 genes of varying sizes, respectively. We produced Chaos Game Representation images for each gene sequence to obtain 2-D binary patterns. These patterns were arranged in succession, and a 3-D tensor structure was achieved for each gene network. Features for each data sample were acquired by exploiting Enhanced Multivariance Products Representation to 3-D data. Features were split as training and testing vectors. Training vectors were employed to train a Support Vector Machines classification model. We achieved more than 96% and 99% classification accuracies for mTOR and TGF-Beta networks, respectively, using a limited amount of training samples.
翻译:在医学遗传学中,每个遗传变异均作为独立实体评估其临床重要性。然而,在大多数复杂疾病中,特定基因网络中的变异组合(而非单一特定变异的存在)占据主导地位。针对复杂疾病,疾病状态可通过评估特定变异"团队"的协作效能进行判断。我们提出一种基于高维建模的方法,用于综合分析基因网络中的所有变异。为评估该方法,我们选取了mTOR和TGF-Beta两个基因网络。针对每条通路,分别生成400个对照组和400个患者组样本。mTOR与TGF-β通路分别包含31个和93个大小不等的基因。我们为每条基因序列生成混沌游戏表示图像以获取二维二值模式,将这些模式序列化排列后,为每个基因网络构建三维张量结构。通过将增强多变量乘积表示应用于三维数据,获得每个数据样本的特征向量,并将特征划分为训练集与测试集。使用训练向量训练支持向量机分类模型。在有限训练样本条件下,mTOR与TGF-Beta网络的分类准确率分别达到96%以上和99%以上。