Resampling methods such as the bootstrap have proven invaluable in the field of machine learning. However, the applicability of traditional bootstrap methods is limited when dealing with large streams of dependent data, such as time series or spatially correlated observations. In this paper, we propose a novel bootstrap method that is designed to account for data dependencies and can be executed online, making it particularly suitable for real-time applications. This method is based on an autoregressive sequence of increasingly dependent resampling weights. We prove the theoretical validity of the proposed bootstrap scheme under general conditions. We demonstrate the effectiveness of our approach through extensive simulations and show that it provides reliable uncertainty quantification even in the presence of complex data dependencies. Our work bridges the gap between classical resampling techniques and the demands of modern data analysis, providing a valuable tool for researchers and practitioners in dynamic, data-rich environments.
翻译:自举法(bootstrap)等重采样方法在机器学习领域已被证明极具价值。然而,在处理大规模相依数据流(如时间序列或空间相关观测)时,传统自举方法的适用性受到限制。本文提出一种新型自举方法,该方法专为处理数据依赖性而设计,并可在在线模式下执行,特别适用于实时应用场景。该方法基于自回归序列构建逐渐增强依赖性的重采样权重。我们在一般条件下证明了所提自举方案的理论有效性。通过大量模拟实验展示了该方法的有效性,并表明即使在复杂数据依赖存在的情况下,它也能提供可靠的不确定性量化。本研究弥合了经典重采样技术与现代数据分析需求之间的鸿沟,为动态数据密集型环境中的研究人员与实践者提供了重要工具。