We present a new adaptive algorithm for learning discrete distributions under distribution drift. In this setting, we observe a sequence of independent samples from a discrete distribution that is changing over time, and the goal is to estimate the current distribution. Since we have access to only a single sample for each time step, a good estimation requires a careful choice of the number of past samples to use. To use more samples, we must resort to samples further in the past, and we incur a drift error due to the bias introduced by the change in distribution. On the other hand, if we use a small number of past samples, we incur a large statistical error as the estimation has a high variance. We present a novel adaptive algorithm that can solve this trade-off without any prior knowledge of the drift. Unlike previous adaptive results, our algorithm characterizes the statistical error using data-dependent bounds. This technicality enables us to overcome the limitations of the previous work that require a fixed finite support whose size is known in advance and that cannot change over time. Additionally, we can obtain tighter bounds depending on the complexity of the drifting distribution, and also consider distributions with infinite support.
翻译:我们提出了一种新的自适应算法,用于在分布漂移情况下学习离散分布。在此设定下,我们观测到来自随时间变化的离散分布的独立样本序列,目标是估计当前分布。由于每个时间步仅能获得一个样本,良好的估计需要谨慎选择历史样本数量。若使用更多样本,则需借助更久远的历史数据,但因分布变化引入的偏差会产生漂移误差。反之,若使用较少历史样本,则估计方差较大,导致统计误差显著。我们提出了一种无需预先知晓漂移特性的新型自适应算法,可解决这一权衡问题。与以往自适应结果不同,本算法通过数据依赖的界来刻画统计误差,这一技术特性使我们能够克服先前工作的局限性——即需预先已知固定有限支撑集且其规模不可随时间变化。此外,我们还可根据漂移分布的复杂度获得更紧致的界,并处理具有无限支撑集的分布。