We consider the task of properly PAC learning decision trees with queries. Recent work of Koch, Strassle, and Tan showed that the strictest version of this task, where the hypothesis tree $T$ is required to be optimally small, is NP-hard. Their work leaves open the question of whether the task remains intractable if $T$ is only required to be close to optimal, say within a factor of 2, rather than exactly optimal. We answer this affirmatively and show that the task indeed remains NP-hard even if $T$ is allowed to be within any constant factor of optimal. More generally, our result allows for a smooth tradeoff between the hardness assumption and the inapproximability factor. As Koch et al.'s techniques do not appear to be amenable to such a strengthening, we first recover their result with a new and simpler proof, which we couple with a new XOR lemma for decision trees. While there is a large body of work on XOR lemmas for decision trees, our setting necessitates parameters that are extremely sharp, and are not known to be attainable by existing XOR lemmas. Our work also carries new implications for the related problem of Decision Tree Minimization.
翻译:我们考虑通过查询方式正确进行PAC学习决策树的任务。Koch、Strassle和Tan的最新研究表明,该任务的最严格版本——要求假设树$T$必须达到理论最优规模——是NP困难的。他们的工作遗留了一个开放问题:若仅要求$T$接近最优(例如在2倍近似比内),而非严格最优,该任务是否仍然难以处理?我们对此给出了肯定回答,证明即使允许$T$在任意常数倍最优解范围内,该任务仍然是NP困难的。更一般地,我们的结果能够在硬度假设与不可逼近因子之间建立平滑的权衡关系。由于Koch等人的技术方法似乎难以支持此类强化证明,我们首先通过一种新颖且更简洁的证明方法复现了他们的结果,并结合新提出的决策树异或引理完成论证。尽管现有大量关于决策树异或引理的研究,但我们的研究场景需要极其精确的参数指标,而现有异或引理尚未证实能达到该要求。本研究对决策树最小化这一相关问题也具有新的启示意义。