The goal of this paper is to introduce a general argumentation framework for regression in the errors-in-variables regime, allowing for full flexibility about the dimensionality of the data, error probability density types, the (linear or nonlinear) model type and the avoidance of explicit definition of loss functions. Further, we introduce in this framework model fitting for partially unpaired data, i.e. for given data groups the pairing information of input and output is lost (semi-supervised). This is achieved by constructing mixture model densities, which directly model this loss of pairing information allowing for inference. In a numerical simulation study linear and nonlinear model fits are illustrated as well as a real data study is presented based on life expectancy data from the world bank utilizing a multiple linear regression model. These results allow the conclusion that high quality model fitting is possible with partially unpaired data, which opens the possibility for new applications with unfortunate or deliberate loss of pairing information in the data.
翻译:本文旨在提出一个通用的变量含误差回归论证框架,该框架允许在数据维度、误差概率密度类型、(线性或非线性)模型类型方面具有完全的灵活性,并避免显式定义损失函数。此外,我们在此框架中引入了针对部分未配对数据的模型拟合方法,即对于给定的数据组,输入与输出的配对信息已丢失(半监督情况)。这是通过构建混合模型密度来实现的,该密度直接对这种配对信息的丢失进行建模,从而允许进行统计推断。在一项数值模拟研究中,我们展示了线性和非线性模型的拟合结果,并基于世界银行的人均预期寿命数据,利用多元线性回归模型进行了实际数据研究。这些结果表明,即使使用部分未配对数据,也有可能实现高质量的模型拟合,这为数据中配对信息不幸或有意丢失的新应用场景提供了可能性。