In numerous applications, binary reactions or event counts are observed and stored within high-order tensors. Tensor decompositions (TDs) serve as a powerful tool to handle such high-dimensional and sparse data. However, many traditional TDs are explicitly or implicitly designed based on the Gaussian distribution, which is unsuitable for discrete data. Moreover, most TDs rely on predefined multi-linear structures, such as CP and Tucker formats. Therefore, they may not be effective enough to handle complex real-world datasets. To address these issues, we propose ENTED, an \underline{E}fficient \underline{N}onparametric \underline{TE}nsor \underline{D}ecomposition for binary and count tensors. Specifically, we first employ a nonparametric Gaussian process (GP) to replace traditional multi-linear structures. Next, we utilize the \pg augmentation which provides a unified framework to establish conjugate models for binary and count distributions. Finally, to address the computational issue of GPs, we enhance the model by incorporating sparse orthogonal variational inference of inducing points, which offers a more effective covariance approximation within GPs and stochastic natural gradient updates for nonparametric models. We evaluate our model on several real-world tensor completion tasks, considering binary and count datasets. The results manifest both better performance and computational advantages of the proposed model.
翻译:在许多应用中,二进制反应或事件计数以高阶张量的形式被观测并存储。张量分解(TDs)是处理此类高维稀疏数据的强大工具。然而,许多传统张量分解方法明确或隐含地基于高斯分布设计,这并不适用于离散数据。此外,大多数张量分解依赖于预定义的多线性结构(如CP和Tucker格式),因此可能无法有效处理复杂的真实世界数据集。为解决这些问题,我们提出ENTED——一种用于二值化和计数张量的高效非参数张量分解方法。具体而言,我们首先采用非参数高斯过程(GP)替代传统的多线性结构。其次,利用PG增强方法构建统一框架,为二值化和计数分布建立共轭模型。最后,针对高斯过程存在的计算问题,我们通过引入诱导点的稀疏正交变分推理对模型进行改进,从而提供更有效的高斯过程协方差近似以及非参数模型的随机自然梯度更新。我们在多个真实世界的张量补全任务中评估了模型性能(涵盖二值化和计数数据集),结果表明所提模型兼具更优的性能和计算优势。