In this paper we study the problem of testing of constrained samplers over high-dimensional distributions with $(\varepsilon,\eta,\delta)$ guarantees. Samplers are increasingly used in a wide range of safety-critical ML applications, and hence the testing problem has gained importance. For $n$-dimensional distributions, the existing state-of-the-art algorithm, $\mathsf{Barbarik2}$, has a worst case query complexity of exponential in $n$ and hence is not ideal for use in practice. Our primary contribution is an exponentially faster algorithm that has a query complexity linear in $n$ and hence can easily scale to larger instances. We demonstrate our claim by implementing our algorithm and then comparing it against $\mathsf{Barbarik2}$. Our experiments on the samplers $\mathsf{wUnigen3}$ and $\mathsf{wSTS}$, find that $\mathsf{Barbarik3}$ requires $10\times$ fewer samples for $\mathsf{wUnigen3}$ and $450\times$ fewer samples for $\mathsf{wSTS}$ as compared to $\mathsf{Barbarik2}$.
翻译:本文研究了具有$(\varepsilon,\eta,\delta)$保证的高维分布约束采样器的测试问题。采样器日益广泛应用于各类安全关键的机器学习应用中,因此该测试问题的重要性与日俱增。针对$n$维分布,现有最先进算法$\mathsf{Barbarik2}$的最坏情况查询复杂度呈指数级增长(关于$n$),因此在实际应用中不够理想。我们的主要贡献在于提出了一种指数级加速的算法,其查询复杂度与$n$呈线性关系,从而能够轻松扩展到更大规模的实例。我们通过实现该算法并与$\mathsf{Barbarik2}$进行对比实验来验证这一结论。针对采样器$\mathsf{wUnigen3}$和$\mathsf{wSTS}$的实验结果表明,与$\mathsf{Barbarik2}$相比,$\mathsf{Barbarik3}$在$\mathsf{wUnigen3}$上所需样本量减少至十分之一,在$\mathsf{wSTS}$上所需样本量减少至四百五十分之一。