The Balanced Up-Down Walk

Markov chains based on spanning trees have been hugely influential in algorithms for assessing fairness in political redistricting. The input graph represents the geographic building blocks of a jurisdiction. The goal is to output a large ensemble of random graph partitions, which is done by drawing and splitting random spanning trees. Crucially, these subtrees must be balanced, since political districts are required to have equal population. The Up-Down walk (on trees or forests) repeatedly adds a random edge then deletes a random edge to produce a new tree or forest; it can be used to efficiently generate a large ensemble, but the rejection rate to maintain balance grows exponentially with the number of parts. ReCom, the most widely-used class of Markov chains, circumvents this complexity barrier by merging and splitting pairs of districts at a time. This runs fast in practice but can have trouble exploring the state space. To overcome these efficiency and mixing barriers, we propose a new Markov chain called the Balanced Up-Down (BUD) walk. The main idea is to run the Up-Down walk on the space of trees, but require all steps to preserve the property that the tree is splittable into balanced subtrees. The BUD walk samples from a known invariant measure under exact balance. We prove that the BUD walk is irreducible in several cases, including a regime where ReCom is not irreducible. Running the BUD walk efficiently presents algorithmic challenges, especially when parts are allowed to deviate from their ideal size. A key subroutine is determining whether a tree is splittable into approximately-balanced subtrees. We give an improved analysis of an existing algorithm for this problem and prove that the associated counting problem is #P-complete. We empirically validate the usefulness of the BUD walk by comparing its performance to that of other existing methods for sampling partitions.

翻译：基于生成树的马尔可夫链在评估政治选区划分公平性的算法中具有巨大影响力。输入图表示一个管辖区域的地理构建单元。目标是通过绘制并分割随机生成树，输出一个大规模的随机图划分集合。关键在于这些子树必须是平衡的，因为政治选区要求具有相等的人口规模。上下游走（作用于树或森林）通过反复添加随机边再删除随机边来生成新的树或森林；该方法可用于高效生成大规模集合，但为维持平衡所需的拒绝率会随分区数量呈指数级增长。目前最广泛使用的马尔可夫链类别ReCom通过每次合并与分割成对选区来规避这一复杂度障碍。该方法在实践中运行快速，但在探索状态空间时可能遇到困难。为克服这些效率与混合障碍，我们提出了一种称为平衡上下游走的新型马尔可夫链。其核心思想是在树空间上运行上下游走，但要求所有步骤都保持树可分割为平衡子树的特性。BUD游走从精确平衡条件下的已知不变测度中采样。我们证明了BUD游走在若干情形下具有不可约性，包括ReCom非不可约的特定区域。高效运行BUD游走面临算法挑战，特别是在允许分区偏离其理想规模时。关键子程序是判定一棵树是否可分割为近似平衡的子树。我们对现有算法进行了改进分析，并证明相关计数问题属于#P完全问题。通过将BUD游走与现有其他划分采样方法的性能进行比较，我们实证验证了该方法的实用性。