Are We Still Missing an Item?

The missing item problem, as introduced by Stoeckl in his work at SODA 23, focuses on continually identifying a missing element $e$ in a stream of elements ${e_1, ..., e_{\ell}}$ from the set $\{1,2,...,n\}$, such that $e \neq e_i$ for any $i \in \{1,...,n\}$. Stoeckl's investigation primarily delves into scenarios with $\ell<n$, providing bounds for the (i) deterministic case, (ii) the static case -- where the algorithm might be randomized but the stream is fixed in advanced) and (iii) the adversarially robust case -- where the algorithm is randomized and each stream element can be chosen depending on earlier algorithm outputs. Building upon this foundation, our paper addresses previously unexplored aspects of the missing item problem. In the first segment, we examine the static setting with a long stream, where the length of the steam $\ell$ is close to or even exceeds the size of the universe $n$. We present an algorithm demonstrating that even when $\ell$ is very close to $n$ (say $\ell=n-1$), polylog($n$) bits of memory suffice to identify the missing item. Additionally, we establish tight bounds of $\tilde{\Theta(k)}$ for the scenario of $\ell = n+k$. The second segment of this part of our work focuses on the {\em adversarially robust setting}. We show a lower bound for a pseudo-deterministic error-zero (where the algorithm reports its errors) algorithm of approximating $\Omega(\ell)$, up to polylog factors. Based on Stoeckl's work, we establish a lower bound for a random-start (only use randomness at initialization) error-zero streaming algorithm. In the final segment, we explore streaming algorithms with randomness-on-the-fly, where the random bits that are saved for future use are included in the space cost. For streams with length $\ell = O(\sqrt{n})$, we provide an upper bound of $O(log n)$. This establishes a gap between randomness-on-the-fly to random-start.

翻译：缺失元素问题由Stoeckl在SODA 23的工作中提出，关注的是从集合{1,2,...,n}中持续识别元素流{e_1, ..., e_ℓ}中缺失的元素e，使得对于任意i ∈ {1,...,n}，均有e ≠ e_i。Stoeckl的研究主要探讨了ℓ<n的情况，给出了以下三类场景的界：(i) 确定性情形；(ii) 静态情形——算法可随机化但数据流预先固定；(iii) 对抗鲁棒情形——算法随机化且每个流元素可根据先前算法输出选择。基于此基础，本文探讨了缺失元素问题中此前未涉及的方向。第一部分研究长数据流的静态场景，其中流长度ℓ接近甚至超过全集大小n。我们提出一种算法证明：即使当ℓ非常接近n时（例如ℓ=n-1），仅需polylog(n)位内存即可识别缺失元素。同时，针对ℓ=n+k的场景，我们建立了̃Θ(k)的紧界。本部分工作的第二项重点聚焦于对抗鲁棒设置。我们为一种伪确定性零错误（算法会报告其错误）算法建立了下界Ω(ℓ)（忽略多对数因子）。基于Stoeckl的工作，我们还为随机启动（仅初始化时使用随机性）零错误流算法建立了下界。最后部分探讨了即时随机性流算法，其中为后续使用而保存的随机比特计入空间开销。对于长度为ℓ=O(√n)的数据流，我们给出了O(log n)的上界。这揭示了即时随机性与随机启动之间的性能差距。