Agnostic proper learning of monotone functions: beyond the black-box correction barrier

We give the first agnostic, efficient, proper learning algorithm for monotone Boolean functions. Given $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$ uniformly random examples of an unknown function $f:\{\pm 1\}^n \rightarrow \{\pm 1\}$, our algorithm outputs a hypothesis $g:\{\pm 1\}^n \rightarrow \{\pm 1\}$ that is monotone and $(\mathrm{opt} + \varepsilon)$-close to $f$, where $\mathrm{opt}$ is the distance from $f$ to the closest monotone function. The running time of the algorithm (and consequently the size and evaluation time of the hypothesis) is also $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$, nearly matching the lower bound of Blais et al (RANDOM '15). We also give an algorithm for estimating up to additive error $\varepsilon$ the distance of an unknown function $f$ to monotone using a run-time of $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$. Previously, for both of these problems, sample-efficient algorithms were known, but these algorithms were not run-time efficient. Our work thus closes this gap in our knowledge between the run-time and sample complexity. This work builds upon the improper learning algorithm of Bshouty and Tamon (JACM '96) and the proper semiagnostic learning algorithm of Lange, Rubinfeld, and Vasilyan (FOCS '22), which obtains a non-monotone Boolean-valued hypothesis, then ``corrects'' it to monotone using query-efficient local computation algorithms on graphs. This black-box correction approach can achieve no error better than $2\mathrm{opt} + \varepsilon$ information-theoretically; we bypass this barrier by a) augmenting the improper learner with a convex optimization step, and b) learning and correcting a real-valued function before rounding its values to Boolean. Our real-valued correction algorithm solves the ``poset sorting'' problem of [LRV22] for functions over general posets with non-Boolean labels.

翻译：我们给出了首个针对单调布尔函数的不可知、高效、恰当学习算法。给定未知函数 $f:\{\pm 1\}^n \rightarrow \{\pm 1\}$ 的 $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$ 个均匀随机样本，我们的算法输出一个单调且与 $f$ 在 $(\mathrm{opt} + \varepsilon)$ 内接近的假设 $g:\{\pm 1\}^n \rightarrow \{\pm 1\}$，其中 $\mathrm{opt}$ 是 $f$ 到最近单调函数的距离。算法的运行时间（以及假设的大小和评估时间）也为 $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$，几乎匹配 Blais 等人 (RANDOM '15) 的下界。我们还给出了一种算法，用于估计未知函数 $f$ 到单调函数的距离，其附加误差为 $\varepsilon$，运行时间为 $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$。此前，对于这两个问题，样本高效算法已知，但这些算法在运行时间上并非高效。因此，我们的工作弥合了运行时间与样本复杂度之间的认知差距。本工作建立在 Bshouty 和 Tamon (JACM '96) 的不当学习算法以及 Lange、Rubinfeld 和 Vasilyan (FOCS '22) 的恰当半不可知学习算法之上，后者先获得一个非单调的布尔值假设，然后通过图上的查询高效局部计算算法将其“校正”为单调函数。这种黑箱校正方法在信息论上无法达到比 $2\mathrm{opt} + \varepsilon$ 更好的误差；我们通过以下方式突破了这一障碍：a) 对不当学习器增加凸优化步骤，b) 在将实值函数舍入为布尔值之前，学习和校正该实值函数。我们的实值校正算法解决了 [LRV22] 中针对具有非布尔标签的一般偏序集上函数的“偏序集排序”问题。