Efficient estimation methods for simultaneous autoregressive (SAR) models with missing data in the response variable have been well-developed in the literature. It is common practice to introduce a measurement error into SAR models. The measurement error serves to distinguish the noise component from the spatial process. However, the previous literature has not considered adding a measurement error to the SAR models with missing data. The maximum likelihood estimation for such models with large datasets is challenging and computationally expensive. This paper proposes two efficient likelihood-based estimation methods: the marginal maximum likelihood (ML) and expectation-maximisation (EM) algorithms for estimating SAR models with both measurement errors and missing data in the response variable. The spatial error model (SEM) and the spatial autoregressive model (SAM), two popular SAR model types, are considered. The missing data mechanism is assumed to follow missing at random (MAR). While naive calculation approaches lead to computational complexities of $O(n^3)$, where n is the total number of observations, our computational approaches for both the marginal ML and EM algorithms are designed to reduce the computational complexity. The performance of the proposed methods is investigated empirically using simulated and real datasets.
翻译:针对响应变量存在缺失数据的同步自回归模型,现有文献已发展出高效的估计方法。通常做法是将测量误差引入SAR模型,用以区分空间过程与噪声成分。然而,现有研究尚未考虑在含缺失数据的SAR模型中引入测量误差。对此类大规模数据集的最大似然估计具有挑战性且计算代价高昂。本文提出两种基于似然的高效估计方法:边际最大似然算法与期望最大化算法,用于估计同时包含测量误差和响应变量缺失数据的SAR模型。研究选取两种主流SAR模型类型——空间误差模型与空间自回归模型。缺失数据机制假设为随机缺失。尽管朴素计算方法会导致计算复杂度达到$O(n^3)$(n为观测总数),但本文针对边际ML和EM算法设计的计算方法能够有效降低计算复杂度。通过模拟数据集和真实数据集对所提方法的性能进行实证研究。