Generalized Score Matching: Beyond The IID Case

Score matching is an estimation procedure that has been developed for statistical models whose probability density function is known up to proportionality but whose normalizing constant is intractable. For such models, maximum likelihood estimation will be difficult or impossible to implement. To date, nearly all applications of score matching have focused on continuous IID (independent and identically distributed) models. Motivated by various data modelling problems for which the continuity assumption and/or the IID assumption are not appropriate, this article proposes three novel extensions of score matching: (i) to univariate and multivariate ordinal data (including count data); (ii) to INID (independent but not necessarily identically distributed) data models, including regression models with either a continuous or a discrete ordinal response; and (iii) to a class of dependent data models known as auto models. Under the INID assumption, a unified asymptotic approach to settings (i) and (ii) is developed and, under mild regularity conditions, it is proved that the proposed score matching estimators are consistent and asymptotically normal. These theoretical results provide a sound basis for score-matching-based inference and are supported by strong performance in simulation studies and a real data example involving doctoral publication data. Regarding (iii), motivated by a spatial geochemical dataset, we develop a novel auto model for spatially dependent spherical data and propose a score-matching-based Wald statistic to test for the presence of spatial dependence. Our proposed auto model exhibits a way to model spatial dependence of directions, is computationally convenient to use and is expected to be superior to composite likelihood approaches for reasons that are explained.

翻译：分数匹配是一种针对概率密度函数成比例已知但归一化常数难以处理的统计模型的估计方法。对于此类模型，极大似然估计将难以甚至无法实施。迄今为止，几乎所有分数匹配的应用都集中于连续独立同分布模型。受连续性假设和/或独立同分布假设不适用的各类数据建模问题驱动，本文提出分数匹配的三项新颖扩展：(i) 适用于单变量和多变量有序数据（包括计数数据）；(ii) 适用于独立但不必同分布数据模型，包括响应变量为连续或离散有序的回归模型；(iii) 适用于一类称为自模型的相依数据模型。在独立但不必同分布假设下，针对情形(i)和(ii)建立了统一渐近方法，并在温和正则条件下证明了所提分数匹配估计量的一致性和渐近正态性。这些理论结果为基于分数匹配的推断提供了坚实基础，并通过模拟研究及涉及博士发表数据的真实数据实例的优异表现得到验证。针对情形(iii)，受空间地球化学数据集启发，我们开发了适用于空间相依球面数据的新型自模型，并提出基于分数匹配的Wald统计量来检验空间相依性的存在。所提自模型展现了方向空间相依性的建模方式，具有计算便捷性，且预计优于复合似然方法——其原因将在文中阐明。