Recent works have shown that imposing tensor structures on the coefficient tensor in regression problems can lead to more reliable parameter estimation and lower sample complexity compared to vector-based methods. This work investigates a new low-rank tensor model, called Low Separation Rank (LSR), in Generalized Linear Model (GLM) problems. The LSR model -- which generalizes the well-known Tucker and CANDECOMP/PARAFAC (CP) models, and is a special case of the Block Tensor Decomposition (BTD) model -- is imposed onto the coefficient tensor in the GLM model. This work proposes a block coordinate descent algorithm for parameter estimation in LSR-structured tensor GLMs. Most importantly, it derives a minimax lower bound on the error threshold on estimating the coefficient tensor in LSR tensor GLM problems. The minimax bound is proportional to the intrinsic degrees of freedom in the LSR tensor GLM problem, suggesting that its sample complexity may be significantly lower than that of vectorized GLMs. This result can also be specialised to lower bound the estimation error in CP and Tucker-structured GLMs. The derived bounds are comparable to tight bounds in the literature for Tucker linear regression, and the tightness of the minimax lower bound is further assessed numerically. Finally, numerical experiments on synthetic datasets demonstrate the efficacy of the proposed LSR tensor model for three regression types (linear, logistic and Poisson). Experiments on a collection of medical imaging datasets demonstrate the usefulness of the LSR model over other tensor models (Tucker and CP) on real, imbalanced data with limited available samples.
翻译:近期研究表明,在回归问题中对系数张量施加张量结构,相较于基于向量的方法能够实现更可靠的参数估计和更低的样本复杂度。本研究探讨了一种名为低分离秩(LSR)的新型低秩张量模型在广义线性模型(GLM)问题中的应用。该LSR模型——推广了著名的Tucker和CANDECOMP/PARAFAC(CP)模型,并作为块张量分解(BTD)模型的一个特例——被施加于GLM模型中的系数张量。本文提出了一种用于LSR结构张量GLM参数估计的块坐标下降算法。尤为重要的是,它推导出了LSR张量GLM问题中系数张量估计误差阈值的最小最大下界。该最小最大界与LSR张量GLM问题的内在自由度成正比,表明其样本复杂度可能显著低于向量化GLM。这一结果还可专门用于推导CP和Tucker结构GLM的估计误差下界。所得界与文献中Tucker线性回归的紧界相当,并通过数值实验进一步评估了最小最大下界的紧致性。最后,在合成数据集上的数值实验证明了所提出的LSR张量模型在三种回归类型(线性、逻辑斯蒂和泊松)中的有效性。对一组医学影像数据集的实验进一步表明,在真实、不平衡且样本有限的条件下,LSR模型相较于其他张量模型(Tucker和CP)更具实用性。