Monitoring microbiological behaviors in water is crucial to manage public health risk from waterborne pathogens, although quantifying the concentrations of microbiological organisms in water is still challenging because concentrations of many pathogens in water samples may often be below the quantification limit, producing censoring data. To enable statistical analysis based on quantitative values, the true values of non-detected measurements are required to be estimated with high precision. Tobit model is a well-known linear regression model for analyzing censored data. One drawback of the Tobit model is that only the target variable is allowed to be censored. In this study, we devised a novel extension of the classical Tobit model, called the \emph{multi-target Tobit model}, to handle multiple censored variables simultaneously by introducing multiple target variables. For fitting the new model, a numerical stable optimization algorithm was developed based on elaborate theories. Experiments conducted using several real-world water quality datasets provided an evidence that estimating multiple columns jointly gains a great advantage over estimating them separately.
翻译:监测水中微生物行为对于管控水源性病原体引发的公共健康风险至关重要,然而由于水样中许多病原体的浓度常低于定量限,导致数据删失,使得微生物浓度的量化仍具挑战性。为基于定量值进行统计分析,需要高精度估计未检出测量的真实值。Tobit模型是分析删失数据的经典线性回归模型,其局限在于仅允许目标变量存在删失。本研究提出经典Tobit模型的新扩展——多目标Tobit模型,通过引入多个目标变量来实现对多个删失变量的同步处理。为拟合新模型,基于严谨理论开发了数值稳定的优化算法。基于多个真实水质数据集的实验表明,联合估计多列参数相较于单独估计具有显著优势。