Tensor classification is gaining importance across fields, yet handling partially observed data remains challenging. In this paper, we introduce a novel approach to tensor classification with incomplete data, framed within high-dimensional tensor linear discriminant analysis. Specifically, we consider a high-dimensional tensor predictor with missing observations under the Missing Completely at Random (MCR) assumption and employ the Tensor Gaussian Mixture Model (TGMM) to capture the relationship between the tensor predictor and class label. We propose a Tensor Linear Discriminant Analysis with Missing Data (Tensor LDA-MD) algorithm, which manages high-dimensional tensor predictors with missing entries by leveraging the decomposable low-rank structure of the discriminant tensor. Our work establishes convergence rates for the estimation error of the discriminant tensor with incomplete data and minimax optimal bounds for the misclassification rate, addressing key gaps in the literature. Additionally, we derive large deviation bounds for the generalized mode-wise sample covariance matrix and its inverse, which are crucial tools in our analysis and hold independent interest. Our method demonstrates excellent performance in simulations and real data analysis, even with significant proportions of missing data.
翻译:张量分类在各个领域的重要性日益凸显,然而处理部分观测数据仍然具有挑战性。本文提出了一种处理缺失数据张量分类的新方法,该方法基于高维张量线性判别分析框架。具体而言,我们在完全随机缺失假设下考虑具有缺失观测的高维张量预测变量,并采用张量高斯混合模型来捕捉张量预测变量与类别标签之间的关系。我们提出了带缺失数据的张量线性判别分析算法,该算法通过利用判别张量的可分解低秩结构来处理具有缺失条目的高维张量预测变量。我们的工作为缺失数据下判别张量的估计误差建立了收敛速率,并为误分类率提供了极小极大最优界,填补了文献中的关键空白。此外,我们推导了广义模态样本协方差矩阵及其逆的大偏差界,这些是我们分析中的关键工具,并具有独立的研究价值。我们的方法在仿真和实际数据分析中表现出优异的性能,即使在数据缺失比例较高的情况下也是如此。