Understanding historical datasets, such as the England and Wales infant mortality data, for local government districts can provide valuable insights into our changing society. Such analyses can prove challenging in practice, due to frequent changes in the boundaries of local government districts for which records are collected. One solution adopted in the literature to overcome such practical challenges is to pre-process data using areal interpolation to render the units consistent over the time period of focus. However, such methods are prone to errors. In this paper we introduce a novel changepoint method to detect instances where interpolation performs poorly. We demonstrate the utility of our method on original data, and also demonstrate how correcting interpolation errors can affect the clustering of the infant mortality curves.
翻译:理解地方行政区域的历史数据集(如英格兰与威尔士婴儿死亡率数据)能为社会变迁提供重要洞见。由于数据采集所依据的地方行政区划边界频繁变更,此类分析在实践中常面临挑战。现有研究为克服这一难题,常采用区域插值法对数据进行预处理,以使研究时段内的统计单元保持统一。然而,此类方法易产生误差。本文提出一种新颖的变点检测方法,用于识别插值效果不佳的实例。我们在原始数据上验证了该方法的有效性,并展示了修正插值误差如何影响婴儿死亡率曲线的聚类分析。