Resampling methods such as the bootstrap have proven invaluable in the field of machine learning. However, the applicability of traditional bootstrap methods is limited when dealing with large streams of dependent data, such as time series or spatially correlated observations. In this paper, we propose a novel bootstrap method that is designed to account for data dependencies and can be executed online, making it particularly suitable for real-time applications. This method is based on an autoregressive sequence of increasingly dependent resampling weights. We prove the theoretical validity of the proposed bootstrap scheme under general conditions. We demonstrate the effectiveness of our approach through extensive simulations and show that it provides reliable uncertainty quantification even in the presence of complex data dependencies. Our work bridges the gap between classical resampling techniques and the demands of modern data analysis, providing a valuable tool for researchers and practitioners in dynamic, data-rich environments.
翻译:自举法等重采样方法在机器学习领域已被证明极具价值。然而,当处理大规模依赖数据流(如时间序列或空间相关观测数据)时,传统自举法的适用性受到限制。本文提出一种专为数据依赖关系设计的新型自举法,该方法可在线执行,特别适用于实时应用场景。该方法基于自回归递增依赖重采样权重的序列。我们证明了在一般条件下该自举方案的理论有效性。通过大量仿真实验验证了该方法的有效性,表明即使在复杂数据依赖关系存在的情况下,仍能提供可靠的不确定性量化。本工作填补了经典重采样技术与现代数据分析需求之间的空白,为动态高密度数据环境下的研究人员和实践者提供了有价值的工具。