Tractable Maximization of Budgeted Phylogenetic Diversity on Networks Utilizing Node Scanwidth

Identifying a subset of taxa that maximizes Phylogenetic Diversity (PD) is a cornerstone of quantitative conservation planning. Traditionally, PD is defined over a phylogenetic tree in which leaves resemble present-day taxa and the branch lengths capture the estimated evolutionary distinctiveness. While PD maximization is computationally tractable on trees with unit costs, the problem becomes NP-hard when transitioning to phylogenetic networks or to budgeted versions in which protecting taxa incurs non-homogeneous costs. In this paper, we address these two challenges by providing definitions and a comprehensive analysis of three distinct variants of budgeted PD on networks. We conduct our study through the lens of a small structural parameter, node scanwidth (nsw), which measures the "tree-likeness" of a phylogenetic network. We show that two of the considered variants can be optimized in O*(2^nsw B^2) time, where B is the budget. For the computationally harder, third variant, we provide an algorithm to compute PD scores in O*(3^nsw) time. We further contribute the first exact algorithms to compute node scanwidth, recognizing that the utility of algorithms based on nsw depends on the ability to compute nsw and its corresponding decomposition. Our approaches integrate data reduction rules, dynamic programming, and an Integer Linear Programming formulation. We validate our theoretical results through extensive experiments on highly reticulated, simulated networks containing several hundred taxa, using heterogeneous costs. Our implementation computes PD scores and optimal nsw in fractions of a second, even on the most challenging instances. Furthermore, our budgeted optimization algorithms significantly outperform existing benchmarks for computing PD on networks, which were previously limited to unit-cost scenarios. The software makes analyses even on networks with a thousand taxa tracta...

翻译：辨识能最大化系统发育多样性（PD）的分类单元子集是定量保护规划的核心问题。传统上，PD基于系统发育树定义——其中叶节点代表现存分类单元，分支长度反映估算的进化独特性。尽管在单位成本条件下，树上的PD最大化问题具有计算可解性，但当转换到系统发育网络或预算化版本（其中保护分类单元需承担非均质成本）时，问题将变为NP难问题。本文通过提供网络预算化PD三种变体的定义与综合分析，应对这两项挑战。我们借助小尺度结构参数——节点扫描宽度（nsw）展开研究，该参数衡量系统发育网络的"树相似性"。研究表明，其中两种变体可在O*(2^nsw B^2)时间内优化求解（B为预算），而对计算难度更高的第三种变体，我们提出O*(3^nsw)时间内计算PD分数的算法。我们进一步贡献了首个精确计算节点扫描宽度的算法——认识到基于nsw算法的实用性取决于其分解的求解能力。我们的方法融合了数据归约规则、动态规划及整数线性规划建模。通过包含数百个分类单元的高网状模拟网络与异质成本测试，我们以大规模实验验证了理论结果。即便在最复杂的实例中，我们的实现仍能在十毫秒级完成PD分数与最优nsw计算。此外，我们的预算化优化算法显著超越了此前仅限单位成本场景的网络PD计算基准方法。该软件使涵盖千个分类单元的网络分析成为可能……