We introduce a logistic regression model for data pairs consisting of a binary response and a covariate residing in a non-Euclidean metric space without vector structures. Based on the proposed model we also develop a binary classifier for non-Euclidean objects. We propose a maximum likelihood estimator for the non-Euclidean regression coefficient in the model, and provide upper bounds on the estimation error under various metric entropy conditions that quantify complexity of the underlying metric space. Matching lower bounds are derived for the important metric spaces commonly seen in statistics, establishing optimality of the proposed estimator in such spaces. Similarly, an upper bound on the excess risk of the developed classifier is provided for general metric spaces. A finer upper bound and a matching lower bound, and thus optimality of the proposed classifier, are established for Riemannian manifolds. We investigate the numerical performance of the proposed estimator and classifier via simulation studies, and illustrate their practical merits via an application to task-related fMRI data.
翻译:我们针对由二元响应变量和位于无向量结构的非欧几里得度量空间中的协变量构成的数据对,引入了一种逻辑回归模型。基于所提出的模型,我们还开发了一种针对非欧几里得对象的二元分类器。我们提出了模型中非欧几里得回归系数的最大似然估计量,并在量化底层度量空间复杂性的各种度量熵条件下,给出了该估计误差的上界。对于统计学中常见的若干重要度量空间,我们推导了匹配的下界,从而证明了在这些空间中该估计量的最优性。类似地,针对一般度量空间,我们给出了所开发分类器超额风险的上界。对于黎曼流形,我们建立了更精细的上界和匹配的下界,从而确定了该分类器的最优性。我们通过模拟研究考察了所提估计量和分类器的数值性能,并通过一项任务相关功能磁共振成像数据的应用展示了其实用价值。