With the proliferation of mobile devices, an increasing amount of population data is being collected, and there is growing demand to use the large-scale, multidimensional data in real-world situations. We introduced functional data analysis (FDA) into the problem of predicting the hourly population of different districts of Tokyo. FDA is a methodology that treats and analyzes longitudinal data as curves, which reduces the number of parameters and makes it easier to handle high-dimensional data. Specifically, by assuming a Gaussian process, we avoided the large covariance matrix parameters of the multivariate normal distribution. In addition, the data were time and spatially dependent between districts. To capture these characteristics, a Bayesian factor model was introduced, which modeled the time series of a small number of common factors and expressed the spatial structure in terms of factor loading matrices. Furthermore, the factor loading matrices were made identifiable and sparse to ensure the interpretability of the model. We also proposed a method for selecting factors using the Bayesian shrinkage method. We studied the forecast accuracy and interpretability of the proposed method through numerical experiments and data analysis. We found that the flexibility of our proposed method could be extended to reflect further time series features, which contributed to the accuracy.
翻译:随着移动设备的普及,越来越多的人口数据被采集,利用大规模多维数据解决现实问题的需求也日益增长。本文将函数数据分析引入东京各区每小时人口预测问题中。函数数据分析是一种将纵向数据视为曲线进行处理和分析的方法,可减少参数数量,便于处理高维数据。具体而言,通过假设高斯过程,我们避免了多元正态分布中庞大的协方差矩阵参数。此外,各区数据之间存在时间与空间依赖性。为捕捉这些特征,我们引入贝叶斯因子模型,通过少量公共因子的时间序列建模,并以因子载荷矩阵表达空间结构。进一步地,通过使因子载荷矩阵具有可识别性与稀疏性,保障模型的可解释性。我们还提出了基于贝叶斯收缩方法的因子选择方法。通过数值实验与数据分析,我们研究了所提方法的预测精度与可解释性。结果表明,该方法具有灵活的扩展能力,可融入更丰富的时间序列特征,从而提升预测准确性。