Single-image point cloud reconstruction must infer complete 3D geometry, including occluded parts, from a single RGB image. While diffusion-based reconstructors achieve high accuracy, they typically require many denoising iterations, resulting in slow and expensive inference. We propose Point-MF, a Mean-Flow-based framework for low-NFE single-image point cloud reconstruction that couples a Mean-Flow-compatible architecture with an auxiliary loss. Specifically, Point-MF operates directly in point-cloud space to learn the mean velocity field and enables one-step reconstruction with a single network function evaluation (1-NFE), without relying on VAE-based latent representations. To make Mean Flow effective under large interval jumps, Point-MF employs a Diffusion Transformer tailored to the Mean-Flow setting, conditioned on frozen DINOv3 image features via a lightweight token adapter and equipped with explicit interval/time conditioning. Moreover, we introduce Denoised Space Anchor, a set-distance auxiliary loss on the denoised-space estimate $x_θ$ induced by the predicted velocity field, to stabilize large-step generation and reduce outliers and density artifacts. On ShapeNet-R2N2 and Pix3D, Point-MF strikes a strong balance between reconstruction quality and inference speed compared to multi-step diffusion baselines and competitive feedforward models, while generating high-quality point clouds with millisecond-level latency.
翻译:单图像点云重建需从单张RGB图像中推断完整的3D几何结构(包括被遮挡部分)。尽管基于扩散的重建方法精度较高,但通常需要大量去噪迭代,导致推理过程缓慢且计算成本高昂。我们提出Point-MF——一种基于均值流的低NFE单图像点云重建框架,该框架将兼容均值流的架构与辅助损失函数相结合。具体而言,Point-MF直接在点云空间中学习均值速度场,无需依赖基于变分自编码器的潜在表示,即可通过单次网络函数评估实现一步式重建。为使均值流在大间隔跳跃下保持有效性,Point-MF采用针对均值流场景定制的扩散Transformer,通过轻量级令牌适配器以冻结的DINOv3图像特征为条件,并配备显式间隔/时间条件。此外,我们提出“去噪空间锚点”——一种针对预测速度场诱导的去噪空间估计$x_θ$的集合距离辅助损失函数,用于稳定大步长生成并减少离群点与密度伪影。在ShapeNet-R2N2与Pix3D数据集上,Point-MF在多步扩散基线模型及竞争性前馈模型之间实现了重建质量与推理速度的强平衡,同时生成毫秒级延迟的高质量点云。