We initiate the study of the Careless Coupon Collector's Problem (CCCP), a novel variation of the classical coupon collector, that we envision as a model for information systems such as web crawlers, dynamic caches, and fault-resilient networks. In CCCP, a collector attempts to gather $n$ distinct coupon types by obtaining one coupon type uniformly at random in each discrete round, however the collector is \textit{careless}: at the end of each round, each collected coupon type is independently lost with probability $p$. We analyze the number of rounds required to complete the collection as a function of $n$ and $p$. In particular, we show that it transitions from $Θ(n \ln n)$ when $p = o\big(\frac{\ln n}{n^2}\big)$ up to $Θ\big((\frac{np}{1-p})^n\big)$ when $p=ω\big(\frac{1}{n}\big)$ in multiple distinct phases. Interestingly, when $p=\frac{c}{n}$, the process remains in a metastable phase, where the fraction of collected coupon types is concentrated around $\frac{1}{1+c}$ with probability $1-o(1)$, for a time window of length $e^{Θ(n)}$. Finally, we give an algorithm that computes the expected completion time of CCCP in $O(n^2)$ time.
翻译:我们首次研究了粗心集券者问题(CCCP),这是经典集券者问题的一个新颖变体,我们将其设想为网络爬虫、动态缓存和容错网络等信息系统的模型。在CCCP中,收集者试图通过在每个离散轮次中均匀随机获取一种券类型来收集$n$种不同的券类型,但收集者是\textit{粗心的}:在每轮结束时,每种已收集的券类型会以概率$p$独立丢失。我们分析了完成收集所需轮次数与$n$和$p$的函数关系。特别地,我们证明当$p = o\big(\frac{\ln n}{n^2}\big)$时轮次数为$Θ(n \ln n)$,而当$p=ω\big(\frac{1}{n}\big)$时轮次数增长至$Θ\big((\frac{np}{1-p})^n\big)$,其间经历多个不同的相变阶段。有趣的是,当$p=\frac{c}{n}$时,该过程会进入亚稳态阶段:在长度为$e^{Θ(n)}$的时间窗口内,以$1-o(1)$的概率,已收集券类型的比例始终集中在$\frac{1}{1+c}$附近。最后,我们提出了一种能在$O(n^2)$时间内计算CCCP期望完成时间的算法。