The missing item problem, as introduced by Stoeckl in his work at SODA 23, focuses on continually identifying a missing element $e$ in a stream of elements ${e_1, ..., e_{\ell}}$ from the set $\{1,2,...,n\}$, such that $e \neq e_i$ for any $i \in \{1,...,\ell\}$. Stoeckl's investigation primarily delves into scenarios with $\ell<n$, providing bounds for the (i) deterministic case, (ii) the static case -- where the algorithm might be randomized but the stream is fixed in advanced and (iii) the adversarially robust case -- where the algorithm is randomized and each stream element can be chosen depending on earlier algorithm outputs. Building upon this foundation, our paper addresses previously unexplored aspects of the missing item problem. In the first segment, we examine the static setting with a long stream, where the length of the steam $\ell$ is close to or even exceeds the size of the universe $n$. We present an algorithm demonstrating that even when $\ell$ is very close to $n$ (say $\ell=n-1$), polylog($n$) bits of memory suffice to identify the missing item. When the stream's length $\ell$ exceeds the size of the universe $n$ i.e. $\ell = n +k$, we show a tight bound of roughly $\Theta(k)$. The second segment focuses on the adversarially robust setting. We show a lower bound for a pseudo-deterministic error-zero (where the algorithm reports its errors) algorithm of approximating $\Omega(\ell)$, up to polylog factors. Based on Stoeckl's work and the previous result, we establish a tight bound for a random-start (only use randomness at initialization) error-zero streaming algorithm of roughly $\Theta(\sqrt{\ell})$.
翻译:缺失元素问题,由Stoeckl在SODA 23论文中提出,旨在持续识别来自集合{1,2,...,n}的元素流{e_1, ..., e_ℓ}中一个缺失元素e,使得对于任意i∈{1,...,ℓ},e ≠ e_i。Stoeckl的研究主要探讨了ℓ<n的情形,并为以下情况提供了界:(i) 确定性情况;(ii) 静态情况(算法可能随机化,但数据流是预先固定的);(iii) 对抗鲁棒情况(算法随机化,且每个流元素可依据算法先前输出选择)。在此基础之上,我们的论文探讨了缺失元素问题中此前未被研究的方面。第一部分中,我们考察了长数据流的静态设置,其中流长度ℓ接近甚至超过全集大小n。我们提出的算法表明,即使当ℓ非常接近n(例如ℓ=n-1),仅需polylog(n)比特内存即可识别缺失元素。当流长度ℓ超过全集大小n(即ℓ=n+k)时,我们证明了紧界约为Θ(k)。第二部分聚焦于对抗鲁棒设置。我们证明了伪确定性零错误(算法报告其错误)算法的下界约为Ω(ℓ)(忽略polylog因子)。基于Stoeckl的工作及前述结果,我们为随机初始化(仅在初始化时使用随机性)的零错误流算法建立了紧界,约为Θ(√ℓ)。