Mining large datasets and obtaining calibrated predictions from tem is of immediate relevance and utility in reliable deep learning. In our work, we develop methods for Deep neural networks based inferences in such datasets like the Gene Expression. However, unlike typical Deep learning methods, our inferential technique, while achieving state-of-the-art performance in terms of accuracy, can also provide explanations, and report uncertainty estimates. We adopt the Quantile Regression framework to predict full conditional quantiles for a given set of housekeeping gene expressions. Conditional quantiles, in addition to being useful in providing rich interpretations of the predictions, are also robust to measurement noise. Our technique is particularly consequential in High-throughput Genomics, an area which is ushering a new era in personalized health care, and targeted drug design and delivery. However, check loss, used in quantile regression to drive the estimation process is not differentiable. We propose log-cosh as a smooth-alternative to the check loss. We apply our methods on GEO microarray dataset. We also extend the method to binary classification setting. Furthermore, we investigate other consequences of the smoothness of the loss in faster convergence. We further apply the classification framework to other healthcare inference tasks such as heart disease, breast cancer, diabetes etc. As a test of generalization ability of our framework, other non-healthcare related data sets for regression and classification tasks are also evaluated.
翻译:从大型数据集中挖掘信息并获得校准预测在可靠深度学习领域具有直接相关性和实用性。本研究针对基因表达等数据集,开发了基于深度神经网络的推理方法。与典型深度学习方法不同,我们的推理技术在达到最先进精度表现的同时,还能提供解释性并报告不确定性估计。我们采用分位数回归框架,对给定管家基因表达集预测全条件分位数。条件分位数不仅能对预测结果提供丰富解释,还对测量噪声具有鲁棒性。我们的技术在高通量基因组学(这一推动个性化医疗及靶向药物设计与递送新时代的领域)中尤为重要。然而,分位数回归中驱动估计过程的检查损失函数不可微。我们提出将log-cosh函数作为检查损失的平滑替代方案。我们将该方法应用于GEO微阵列数据集,并扩展至二分类场景。此外,我们研究了损失平滑性对加速收敛的其他影响,并将分类框架应用于心脏病、乳腺癌、糖尿病等其他医疗推理任务。为检验框架泛化能力,我们还评估了回归与分类任务相关的非医疗数据集。