Scene-level point cloud understanding remains challenging due to diverse geometries, imbalanced category distributions, and highly varied spatial layouts. Existing methods improve object-level performance but rely on static network parameters during inference, limiting their adaptability to dynamic scene data. We propose PointTPA, a Test-time Parameter Adaptation framework that generates input-aware network parameters for scene-level point clouds. PointTPA adopts a Serialization-based Neighborhood Grouping (SNG) to form locally coherent patches and a Dynamic Parameter Projector (DPP) to produce patch-wise adaptive weights, enabling the backbone to adjust its behavior according to scene-specific variations while maintaining a low parameter overhead. Integrated into the PTv3 structure, PointTPA demonstrates strong parameter efficiency by introducing two lightweight modules of less than 2% of the backbone's parameters. Despite this minimal parameter overhead, PointTPA achieves 78.4% mIoU on ScanNet validation, surpassing existing parameter-efficient fine-tuning (PEFT) methods across multiple benchmarks, highlighting the efficacy of our test-time dynamic network parameter adaptation mechanism in enhancing 3D scene understanding. The code is available at https://github.com/H-EmbodVis/PointTPA.
翻译:场景级点云理解因几何形态多样、类别分布不均及空间布局高度复杂而仍具挑战性。现有方法虽能提升目标级性能,但在推理过程中依赖静态网络参数,限制了其对动态场景数据的适应能力。我们提出PointTPA(测试时参数自适应框架),通过生成输入感知的网络参数实现场景级点云处理。PointTPA采用基于序列化的邻域分组(SNG)构建局部连贯的补丁块,并引入动态参数投影器(DPP)生成逐补丁自适应权重,使主干网络能根据场景特定变化调整其行为,同时保持低参数开销。将PointTPA集成至PTv3架构后,通过引入两个参数总量不足主干网络2%的轻量模块,展现出卓越的参数效率。尽管参数增量极小,PointTPA在ScanNet验证集上仍达78.4% mIoU,全面超越现有参数高效微调(PEFT)方法,凸显了测试时动态网络参数自适应机制在增强三维场景理解中的有效性。代码已开源:https://github.com/H-EmbodVis/PointTPA。