Modern approaches to supervised learning like deep neural networks (DNNs) typically implicitly assume that observed responses are statistically independent. In contrast, correlated data are prevalent in real-life large-scale applications, with typical sources of correlation including spatial, temporal and clustering structures. These correlations are either ignored by DNNs, or ad-hoc solutions are developed for specific use cases. We propose to use the mixed models framework to handle correlated data in DNNs. By treating the effects underlying the correlation structure as random effects, mixed models are able to avoid overfitted parameter estimates and ultimately yield better predictive performance. The key to combining mixed models and DNNs is using the Gaussian negative log-likelihood (NLL) as a natural loss function that is minimized with DNN machinery including stochastic gradient descent (SGD). Since NLL does not decompose like standard DNN loss functions, the use of SGD with NLL presents some theoretical and implementation challenges, which we address. Our approach which we call LMMNN is demonstrated to improve performance over natural competitors in various correlation scenarios on diverse simulated and real datasets. Our focus is on a regression setting and tabular datasets, but we also show some results for classification. Our code is available at https://github.com/gsimchoni/lmmnn.
翻译:现代监督学习方法(如深度神经网络DNNs)通常隐含假设观测响应在统计上相互独立。然而,实际大规模应用中普遍存在相关数据,其典型相关来源包括空间结构、时间结构和聚类结构。这些相关性要么被DNNs忽略,要么针对特定用例开发临时解决方案。我们提出使用混合模型框架来应对DNNs中的相关数据。通过将导致相关性的效应视为随机效应,混合模型能够避免过拟合的参数估计,最终产生更优的预测性能。结合混合模型与DNNs的关键在于使用高斯负对数似然(NLL)作为自然损失函数,并通过包括随机梯度下降(SGD)在内的DNN机制对其进行最小化。由于NLL不像标准DNN损失函数那样可分解,使用NLL执行SGD会带来理论和实现上的挑战,对此我们予以解决。我们提出的方法(称为LMMNN)在多种模拟和真实数据集的不同相关场景中,被证明能够提升相较于自然竞争方法的性能。本文主要关注回归设置和表格数据集,但也展示了一些分类结果。我们的代码公开于https://github.com/gsimchoni/lmmnn。