Scenario-Based Robust Optimization of Tree Structures

We initiate the study of tree structures in the context of scenario-based robust optimization. Specifically, we study Binary Search Trees (BSTs) and Huffman coding, two fundamental techniques for efficiently managing and encoding data based on a known set of frequencies of keys. Given $k$ different scenarios, each defined by a distinct frequency distribution over the keys, our objective is to compute a single tree of best-possible performance, relative to any scenario. We consider, as performance metrics, the competitive ratio, which compares multiplicatively the cost of the solution to the tree of least cost among all scenarios, as well as the regret, which induces a similar, but additive comparison. For BSTs, we show that the problem is NP-hard across both metrics. We also show how to obtain a tree of competitive ratio $\lceil \log_2(k+1) \rceil$, and we prove that this ratio is optimal. For Huffman Trees, we show that the problem is, likewise, NP-hard across both metrics; we also give an algorithm of regret $\lceil \log_2 k \rceil$, which we show is near-optimal, by proving a lower bound of $\lfloor \log_2 k \rfloor$. Last, we give a polynomial-time algorithm for computing Pareto-optimal BSTs with respect to their regret, assuming scenarios defined by uniform distributions over the keys. This setting captures, in particular, the first study of fairness in the context of data structures. We provide an experimental evaluation of all algorithms. To this end, we also provide mixed integer linear program formulation for computing optimal trees.

翻译：我们在基于场景的鲁棒优化框架下首次系统研究了树结构问题。具体而言，我们聚焦于二叉搜索树（BSTs）和霍夫曼编码这两种基于已知键值频率进行高效数据管理与编码的基础技术。给定$k$个不同场景，每个场景由键值上互异的频率分布定义，我们的目标是构建一棵在任意场景下均能保持最优性能的单一树结构。我们采用两种性能度量指标：竞争比（将解的成本与所有场景中最小成本树进行乘法比较）以及遗憾值（进行类似但为加法比较）。对于二叉搜索树，我们证明该问题在两种度量下均为NP难问题。同时我们给出了构建竞争比为$\lceil \log_2(k+1) \rceil$的树结构的方法，并证明该比值是最优的。对于霍夫曼树，我们同样证明该问题在两种度量下均为NP难问题；我们提出了遗憾值为$\lceil \log_2 k \rceil$的算法，并通过证明$\lfloor \log_2 k \rfloor$的下界说明该算法接近最优。最后，针对键值均匀分布定义的场景，我们给出了计算帕累托最优二叉搜索树（基于遗憾值度量）的多项式时间算法。该设定特别体现了数据结构领域对公平性问题的首次探索。我们对所有算法进行了实验评估，为此还提供了计算最优树的混合整数线性规划模型。