The reliability of data-driven applications in electric vehicle (EV) infrastructure, such as charging demand forecasting, hinges on the availability of complete, high-quality charging data. However, real-world EV datasets are often plagued by missing records, and existing imputation methods are ill-equipped for the complex, multimodal context of charging data, often relying on a restrictive one-model-per-station paradigm that ignores valuable inter-station correlations. To address these gaps, we develop a novel PRobabilistic variational imputation framework that leverages the power of large lAnguage models and retrIeval-augmented Memory (PRAIM). PRAIM employs a pre-trained language model to encode heterogeneous data, spanning time-series demand, calendar features, and geospatial context, into a unified, semantically rich representation. This is dynamically fortified by retrieval-augmented memory that retrieves relevant examples from the entire charging network, enabling a single, unified imputation model empowered by variational neural architecture to overcome data sparsity. Extensive experiments on four public datasets demonstrate that PRAIM significantly outperforms established baselines in both imputation accuracy and its ability to preserve the original data's statistical distribution, leading to substantial improvements in downstream forecasting performance.
翻译:数据驱动应用在电动汽车(EV)基础设施(如充电需求预测)中的可靠性,依赖于完整、高质量充电数据的可用性。然而,现实世界的电动汽车数据集常受缺失记录困扰,且现有插补方法难以应对充电数据复杂、多模态的上下文,往往依赖于限制性的“一站一模型”范式,忽略了有价值的站间相关性。为弥补这些不足,我们开发了一种新颖的概率变分插补框架,该框架利用大语言模型和检索增强记忆库(PRAIM)的能力。PRAIM采用预训练语言模型将异构数据(涵盖时间序列需求、日历特征和地理空间上下文)编码为统一、语义丰富的表示。该表示通过检索增强记忆库动态增强,该记忆库从整个充电网络中检索相关示例,从而赋能一个由变分神经架构驱动的单一统一插补模型,以克服数据稀疏性问题。在四个公共数据集上的大量实验表明,PRAIM在插补准确性和保持原始数据统计分布的能力方面均显著优于现有基线方法,从而在下游预测性能上带来实质性提升。