We introduce a logistic regression model for data pairs consisting of a binary response and a covariate residing in a non-Euclidean metric space without vector structures. Based on the proposed model we also develop a binary classifier for non-Euclidean objects. We propose a maximum likelihood estimator for the non-Euclidean regression coefficient in the model, and provide upper bounds on the estimation error under various metric entropy conditions that quantify complexity of the underlying metric space. Matching lower bounds are derived for the important metric spaces commonly seen in statistics, establishing optimality of the proposed estimator in such spaces. Similarly, an upper bound on the excess risk of the developed classifier is provided for general metric spaces. A finer upper bound and a matching lower bound, and thus optimality of the proposed classifier, are established for Riemannian manifolds. We investigate the numerical performance of the proposed estimator and classifier via simulation studies, and illustrate their practical merits via an application to task-related fMRI data.
翻译:本文提出了一种针对由二元响应变量和位于非欧几里得度量空间(不含向量结构)中的协变量构成的数据对的逻辑回归模型。基于所提出的模型,我们还开发了一种针对非欧几里得对象的二元分类器。我们提出了模型中非欧几里得回归系数的最大似然估计量,并在度量熵条件下(这些条件量化了底层度量空间的复杂度)给出了该估计误差的上界。针对统计学中常见的若干重要度量空间,我们推导了匹配的下界,从而确立了所提估计量在这些空间中的最优性。类似地,针对一般度量空间,我们给出了所开发分类器超额风险的上界。对于黎曼流形,我们建立了更精细的上界以及匹配的下界,从而证明了所提分类器的最优性。我们通过模拟研究考察了所提估计量和分类器的数值性能,并通过一项关于任务相关fMRI数据的应用展示了它们的实用价值。