DNA microarray gene-expression data has been widely used to identify cancerous gene signatures. Microarray can increase the accuracy of cancer diagnosis and prognosis. However, analyzing the large amount of gene expression data from microarray chips pose a challenge for current machine learning researches. One of the challenges lie within classification of healthy and cancerous tissues is high dimensionality of gene expressions. High dimensionality decreases the accuracy of the classification. This research aims to apply a hybrid model of Genetic Algorithm and Neural Network to overcome the problem during subset selection of informative genes. Whereby, a Genetic Algorithm (GA) reduced dimensionality during feature selection and then a Multi-Layer perceptron Neural Network (MLP) is applied to classify selected genes. The performance evaluated by considering to the accuracy and the number of selected genes. Experimental results show the proposed method suggested high accuracy and minimum number of selected genes in comparison with other machine learning algorithms.
翻译:DNA微阵列基因表达数据已被广泛用于识别癌症基因特征。微阵列技术能够提高癌症诊断和预后的准确性。然而,分析来自微阵列芯片的大量基因表达数据对当前机器学习研究构成了挑战。健康组织与癌组织分类的核心难题之一在于基因表达的高维特性,高维度会降低分类准确性。本研究旨在应用遗传算法与神经网络的混合模型,以解决信息基因子集选择过程中的这一难题。具体而言,首先通过遗传算法在特征选择阶段实现降维,随后应用多层感知器神经网络对筛选出的基因进行分类。通过综合考虑分类准确率与所选基因数量对模型性能进行评估。实验结果表明,相较于其他机器学习算法,所提出的方法在保证高准确率的同时实现了最小化的基因选择数量。