We propose a novel statistical inference methodology for multiway count data that is corrupted by false zeros that are indistinguishable from true zero counts. Our approach consists of zero-truncating the Poisson distribution to neglect all zero values. This simple truncated approach dispenses with the need to distinguish between true and false zero counts and reduces the amount of data to be processed. Inference is accomplished via tensor completion that imposes low-rank tensor structure on the Poisson parameter space. Our main result shows that an $N$-way rank-$R$ parametric tensor $\boldsymbol{\mathscr{M}}\in(0,\infty)^{I\times \cdots\times I}$ generating Poisson observations can be accurately estimated by zero-truncated Poisson regression from approximately $IR^2\log_2^2(I)$ non-zero counts under the nonnegative canonical polyadic decomposition. Our result also quantifies the error made by zero-truncating the Poisson distribution when the parameter is uniformly bounded from below. Therefore, under a low-rank multiparameter model, we propose an implementable approach guaranteed to achieve accurate regression in under-determined scenarios with substantial corruption by false zeros. Several numerical experiments are presented to explore the theoretical results.
翻译:我们提出了一种针对被误零污染的多路计数数据的新型统计推断方法,其中误零与真实零计数无法区分。该方法将泊松分布进行零截断处理以忽略所有零值。这种简洁的截断方法无需区分真实零与误零计数,并减少了待处理数据量。通过施加低秩张量结构于泊松参数空间,推断过程借助张量补全实现。主要结果表明:在非负典范多路分解框架下,生成泊松观测值的$N$维秩$R$参数张量$\boldsymbol{\mathscr{M}}\in(0,\infty)^{I\times \cdots\times I}$,可通过零截断泊松回归从约$IR^2\log_2^2(I)$个非零计数中实现精确估计。该结果同时量化了当参数存在一致下界时,对泊松分布进行零截断处理产生的误差。因此,在低秩多参数模型下,我们提出了一种在过欠定场景中可实施的方法,能在误零严重污染条件下保证准确回归。最后通过多项数值实验验证了理论结果。