We introduce the Poisson tensor completion (PTC) estimator that exploits inter-sample relationships to compute a low-rank Poisson tensor decomposition of the frequency histogram for samples of a multivariate distribution. Our crucial observation is that the histogram bins are an instance of a space partitioning of counts and thus can be identified with a spatial non-homogeneous Poisson process. The Poisson tensor decomposition leads to a completion of the mean measure over all bins -- including those containing few to no samples -- and leads to our proposed estimator. A Poisson tensor decomposition models the underlying distribution of the count data and guarantees non-negative estimated values obviating the need for additional constraints to ensure non-negativity. Furthermore, we demonstrate that our PTC estimator is a substantial improvement over standard histogram-based estimators for sub-Gaussian probability distributions because of the concentration of norm phenomenon.
翻译:本文提出了泊松张量补全(PTC)估计器,该估计器利用样本间关系对多元分布样本的频率直方图进行低秩泊松张量分解。我们的核心发现是:直方图箱本质上是计数数据空间划分的实例,因此可将其视为空间非齐次泊松过程。通过泊松张量分解可实现所有箱(包括包含极少甚至零样本的箱)的均值测度补全,从而构建出本文提出的估计器。泊松张量分解能对计数数据的底层分布进行建模,并保证估计值的非负性,无需额外约束条件来确保非负。此外,我们证明对于亚高斯概率分布,由于范数集中现象的存在,PTC估计器相较于基于标准直方图的估计器具有显著改进。