In many modern regression applications, the response consists of multiple categorical random variables whose probability mass is a function of a common set of predictors. In this article, we propose a new method for modeling such a probability mass function in settings where the number of response variables, the number of categories per response, and the dimension of the predictor are large. Our method relies on a functional probability tensor decomposition: a decomposition of a tensor-valued function such that its range is a restricted set of low-rank probability tensors. This decomposition is motivated by the connection between the conditional independence of responses, or lack thereof, and their probability tensor rank. We show that the model implied by such a low-rank functional probability tensor decomposition can be interpreted in terms of a mixture of regressions and can thus be fit using maximum likelihood. We derive an efficient and scalable penalized expectation maximization algorithm to fit this model and examine its statistical properties. We demonstrate the encouraging performance of our method through both simulation studies and an application to modeling the functional classes of genes.
翻译:在许多现代回归应用中,响应变量由多个分类随机变量组成,其概率质量函数是共同预测变量集合的函数。本文针对响应变量数目、每个响应的类别数以及预测变量维度均较大的场景,提出了一种建模该类概率质量函数的新方法。该方法基于函数型概率张量分解:一种张量值函数的分解,其值域被约束为低秩概率张量集合。该分解的动机源于响应变量的条件独立性(或其缺失)与概率张量秩之间的关联。我们证明,此类低秩函数型概率张量分解所隐含的模型可解释为回归混合模型,因此可通过极大似然法进行拟合。我们推导了一种高效且可扩展的惩罚期望最大化算法来拟合该模型,并探讨其统计性质。通过模拟研究与基因功能分类建模的实际应用,我们验证了该方法具有令人鼓舞的性能。