The proliferation of mobile devices has led to the collection of large amounts of population data. This situation has prompted the need to utilize this rich, multidimensional data in practical applications. In response to this trend, we have integrated functional data analysis (FDA) and factor analysis to address the challenge of predicting hourly population changes across various districts in Tokyo. Specifically, by assuming a Gaussian process, we avoided the large covariance matrix parameters of the multivariate normal distribution. In addition, the data were both time and spatially dependent between districts. To capture these characteristics, a Bayesian factor model was introduced, which modeled the time series of a small number of common factors and expressed the spatial structure through factor loading matrices. Furthermore, the factor loading matrices were made identifiable and sparse to ensure the interpretability of the model. We also proposed a Bayesian shrinkage method as a systematic approach for factor selection. Through numerical experiments and data analysis, we investigated the predictive accuracy and interpretability of our proposed method. We concluded that the flexibility of the method allows for the incorporation of additional time series features, thereby improving its accuracy.
翻译:随着移动设备的普及,大量人口数据得以收集。这一现状促使我们在实际应用中利用这些丰富、多维的数据。为应对这一趋势,我们结合函数型数据分析(FDA)与因子分析方法,以解决预测东京各区域每小时人口变化的挑战。具体而言,通过假设高斯过程,我们避免了多元正态分布中庞大的协方差矩阵参数。此外,数据在区域间同时具有时间与空间依赖性。为捕捉这些特征,我们引入了贝叶斯因子模型,该模型对少量公共因子的时间序列进行建模,并通过因子载荷矩阵表达空间结构。进一步地,为使模型具有可解释性,因子载荷矩阵被设定为可识别且稀疏的。我们还提出了一种贝叶斯收缩方法,作为因子选择的系统性途径。通过数值实验与数据分析,我们研究了所提方法的预测精度与可解释性。结论表明,该方法的灵活性允许融入额外的时间序列特征,从而提升其预测准确性。