AGIPC: Adaptive In-Solve Algebraic Coarsening for GPU IPC

Implicit time integration is key to robustly simulating stiff materials and large deformations, but its performance is often dominated by repeatedly solving large linear systems. Adaptive coarsening can reduce this cost by concentrating degrees of freedom (DoF) to where it is most needed, yet conventional explicit remeshing changes connectivity (and often vertex ordering), complicating parallel implementations, harming memory locality, and sometimes being disallowed when it may introduce local geometry intersections. Adaptive subspace approaches avoid topological changes, but basis construction and updates incur irregular data access patterns and typically produce dense system matrices, limiting GPU efficiency and keeping many practical systems CPU-centric. We present algebraic adaptive in-solve coarsening, a GPU-oriented method that dynamically reduces DoF within the Newton solve of implicit time integration without explicit topological modification. Starting from a fine mesh, we express adaptivity as a selective edge-collapse process governed by per-edge tags. Collapsible edges are aggregated in parallel using a warp-level hash mapping scheme that groups fine vertices into coarse super-nodes, while protected edges preserve local detail. This defines an implicit coarse mesh whose linear system is assembled algebraically by mapping and reducing fine-scale gradients and Hessians via efficient GPU reduction kernels. We solve the resulting coarse system with a preconditioned conjugate gradient (PCG) method and then prolongate the solution back to the fine mesh. Our approach integrates seamlessly with IPC's barrier energy and exploits GPU parallelism end-to-end. Across a range of challenging scenarios, we achieve up to 3x speedup over a state-of-the-art GPU IPC solver while producing visually indistinguishable results.

翻译：隐式时间积分是稳健模拟刚性材料和大变形的关键，但其性能常受限于反复求解大规模线性系统。自适应粗化通过将自由度（DoF）集中于最需要区域可降低此计算代价，但传统显式重网格化会改变连接关系（通常包括顶点排序），不仅阻碍并行实现、损害内存局部性，还可能在引入局部几何交叉时被禁用。自适应子空间方法虽避免了拓扑变化，但其基构造与更新会产生不规则数据访问模式并生成稠密系统矩阵，从而限制GPU效率，导致许多实际系统仍以CPU为主。我们提出代数自适应内迭代粗化方法——一种面向GPU的隐式时间积分牛顿求解期动态降自由度技术，无需显式拓扑修改。以细化网格为基础，将自适应性表达为基于边标签的选择性边折叠过程：可折叠边通过Warp级哈希映射方案并行聚合，将细粒度顶点归并为粗超节点，同时保护边保留局部细节。由此定义的隐式粗网格，其线性系统通过代数方式装配——利用高效GPU约减核映射并规约细尺度梯度和黑塞矩阵。我们采用预条件共轭梯度（PCG）法求解所得粗化系统，再将解延拓回细网格。该方法可无缝集成IPC障碍能量，并实现端到端的GPU并行性。在多种挑战性场景中，相比现有最优GPU IPC求解器，我们的方法取得最高3倍加速，同时生成视觉无差异结果。