We resolve a long-standing open question, about the existence of a constant-factor approximation algorithm for the average-case \textsc{Decision Tree} problem with uniform probability distribution over the hypotheses. We answer the question in the affirmative by providing a simple polynomial-time algorithm with approximation ratio of $\frac{2}{1-\sqrt{(e+1)/(2e)}}+ε<11.57$. This improves upon the currently best-known, greedy algorithm which achieves $O(\log n/{\log\log n})$-approximation. The first key ingredient in our analysis is the usage of a decomposition technique known from problems related to \textsc{Hierarchical Clustering} [SODA '17, WALCOM '26], which allows us to decompose the optimal decision tree into a series of objects called separating subfamilies. The second crucial idea is to reduce the subproblem of finding a \textsc{Separating Subfamily} to an instance of the \textsc{Maximum Coverage} problem. To do so, we analyze the properties of cutting cliques into small pieces, which represent pairs of hypotheses to be separated. This allows us to obtain a good approximation for the \textsc{Separating Subfamily} problem, which then enables the design of the approximation algorithm for the original problem.
翻译:我们解决了一个长期悬而未决的公开问题,即对于假设空间上服从均匀概率分布的平均情况\textsc{决策树}问题,是否存在常数因子近似算法。我们通过提供一个简单的多项式时间算法给出了肯定答案,该算法的近似比为$\frac{2}{1-\sqrt{(e+1)/(2e)}}+ε<11.57$。这改进了目前已知最优的贪心算法,后者仅能达到$O(\log n/{\log\log n})$近似。分析中的第一个关键要素是运用了源自\textsc{层次聚类}问题[SODA '17, WALCOM '26]的分解技术,该技术允许我们将最优决策树分解为一系列称为分离子族的对象。第二个关键思路是将寻找\textsc{分离子族}的子问题归约为\textsc{最大覆盖}问题的实例。为此,我们分析了将团切割成小块(这些小块代表待分离的假设对)的性质。这使得我们能够获得\textsc{分离子族}问题的良好近似,进而促成原问题近似算法的设计。