We study the equivalence testing problem where the goal is to determine if the given two unknown distributions on $[n]$ are equal or $\epsilon$-far in the total variation distance in the conditional sampling model (CFGM, SICOMP16; CRS, SICOMP15) wherein a tester can get a sample from the distribution conditioned on any subset. Equivalence testing is a central problem in distribution testing, and there has been a plethora of work on this topic in various sampling models. Despite significant efforts over the years, there remains a gap in the current best-known upper bound of $\tilde{O}(\log \log n)$ [FJOPS, COLT 2015] and lower bound of $\Omega(\sqrt{\log \log n})$[ACK, RANDOM 2015, Theory of Computing 2018]. Closing this gap has been repeatedly posed as an open problem (listed as problems 66 and 87 at sublinear.info). In this paper, we completely resolve the query complexity of this problem by showing a lower bound of $\tilde{\Omega}(\log \log n)$. For that purpose, we develop a novel and generic proof technique that enables us to break the $\sqrt{\log \log n}$ barrier, not only for the equivalence testing problem but also for other distribution testing problems, such as uniblock property.
翻译:我们研究等价性测试问题,其目标是在条件采样模型(CFGM, SICOMP16; CRS, SICOMP15)中,判断给定两个定义在$[n]$上的未知分布是否相等,或在总变差距离下是否$\epsilon$-远离。在该模型中,测试者可以在任意子集上获取条件分布样本。等价性测试是分布测试中的核心问题,众多研究在不同采样模型下对此进行了广泛探讨。尽管历经多年大量努力,当前已知的最佳上界$\tilde{O}(\log \log n)$ [FJOPS, COLT 2015]和最佳下界$\Omega(\sqrt{\log \log n})$ [ACK, RANDOM 2015, Theory of Computing 2018]之间仍存在差距。缩小这一差距被反复列为开放问题(详见sublinear.info中的问题66和87)。在本文中,我们通过证明下界$\tilde{\Omega}(\log \log n)$,彻底解决了该问题的查询复杂度。为此,我们提出了一种新颖且通用的证明技术,不仅能突破等价性测试中$\sqrt{\log \log n}$的屏障,还可应用于其他分布测试问题(例如单块性质)。