Missing time-series data is a prevalent problem in many prescriptive analytics models in operations management, healthcare and finance. Imputation methods for time-series data are usually applied to the full panel data with the purpose of training a prescriptive model for a downstream out-of-sample task. For example, the imputation of missing asset returns may be applied before estimating an optimal portfolio allocation. However, this practice can result in a look-ahead-bias in the future performance of the downstream task, and there is an inherent trade-off between the look-ahead-bias of using the entire data set for imputation and the larger variance of using only the training portion of the data set for imputation. By connecting layers of information revealed in time, we propose a Bayesian consensus posterior that fuses an arbitrary number of posteriors to optimize the variance and look-ahead-bias trade-off in the imputation. We derive tractable two-step optimization procedures for finding the optimal consensus posterior, with Kullback-Leibler divergence and Wasserstein distance as the dissimilarity measure between posterior distributions. We demonstrate in simulations and in an empirical study the benefit of our imputation mechanism for portfolio allocation with missing returns.
翻译:时间序列数据缺失是运营管理、医疗保健和金融等领域许多规范性分析模型中普遍存在的问题。时间序列数据的插补方法通常应用于完整面板数据,旨在为下游样本外任务训练规范性模型。例如,在估计最优投资组合配置之前,可能需要对缺失的资产收益率进行插补。然而,这种做法可能导致下游任务未来性能中的前瞻偏差,并且在使用完整数据集进行插补所产生的前瞻偏差与仅使用训练部分数据集进行插补所导致的较大方差之间存在固有权衡。通过连接时间中逐层揭示的信息,我们提出一种贝叶斯共识后验方法,该方法融合任意数量的后验分布,以优化插补中的方差与前瞻偏差权衡。我们推导出可处理的两步优化程序,以寻找最优共识后验,其中采用Kullback-Leibler散度和Wasserstein距离作为后验分布间的相异度度量。通过模拟实验和实证研究,我们证明了所提出的插补机制在处理缺失收益率投资组合配置问题中的优势。