This paper addresses a regression problem in which output label values are the results of sensing the magnitude of a phenomenon. A low value of such labels can mean either that the actual magnitude of the phenomenon was low or that the sensor made an incomplete observation. This leads to a bias toward lower values in labels and its resultant learning because labels may have lower values due to incomplete observations, even if the actual magnitude of the phenomenon was high. Moreover, because an incomplete observation does not provide any tags indicating incompleteness, we cannot eliminate or impute them. To address this issue, we propose a learning algorithm that explicitly models incomplete observations corrupted with an asymmetric noise that always has a negative value. We show that our algorithm is unbiased as if it were learned from uncorrupted data that does not involve incomplete observations. We demonstrate the advantages of our algorithm through numerical experiments.
翻译:本文针对输出标签值为某种现象强度感知结果的回归问题展开研究。此类标签的低数值可能对应两种情形:现象实际强度较低,或传感器观测不完整。由于标签可能因不完整观测而产生偏低数值(即便现象实际强度较高),这将导致标签值及其后续学习过程产生向低值偏移的偏差。此外,由于不完整观测未提供任何表征其不完整性的标记,我们无法对其进行剔除或插补。为解决该问题,我们提出一种显式建模受非对称噪声(始终呈现负值)污染的不完整观测的学习算法。理论证明表明,该算法可达到与使用未包含不完整观测的未污染数据学习时相同的无偏性。通过数值实验验证了算法的优越性。