Accurate estimation of object volume and surface area from visual data is an open challenge with broad implications across various domains. We propose a unified framework that predicts volumetric and surface metrics directly from a set of 2D multi-view images. Our approach first generates a point cloud from the captured multi-view images using recent 3D reconstruction techniques, while a parallel 2D encoder aggregates view-aligned features. A fusion module then aligns and merges 3D geometry with 2D visual embeddings, followed by a graph-based decoder that regresses volume, surface area, and their corresponding uncertainties. This proposed architecture maintains robustness against sparse or noisy data. We evaluate the framework across multiple application domains: corals, where precise geometric measurements support growth monitoring; food items, where volume prediction relates to dietary tracking and portion analysis; and human bodies, where volumetric cues are crucial for anthropometric and medical applications. Experimental results demonstrate the reliable performance of our framework across diverse scenarios, highlighting its versatility and adaptability. Furthermore, by coupling 3D reconstruction with neural regression and 2D features, our model provides a scalable and fast solution for quantitative shape analysis from visual data.
翻译:从视觉数据中精确估计物体体积和表面积是一个具有广泛影响的开放挑战。我们提出了一种统一框架,可直接从一组二维多视图图像预测体积和表面度量。我们的方法首先利用最新的三维重建技术从捕获的多视图图像生成点云,同时一个并行的二维编码器聚合视图对齐特征。随后,一个融合模块将三维几何与二维视觉嵌入进行对齐与融合,再由一个基于图的解码器回归体积、表面积及其对应的不确定性。该架构保持了在稀疏或噪声数据下的鲁棒性。我们在多个应用领域评估该框架:珊瑚(精确几何测量支持生长监测)、食品(体积预测与膳食追踪及份量分析相关)以及人体(体积线索对人体测量学和医学应用至关重要)。实验结果表明,我们的框架在不同场景下均表现出可靠性能,突显了其多功能性与适应性。此外,通过将三维重建与神经回归及二维特征相结合,我们的模型为基于视觉数据的定量形状分析提供了一个可扩展且快速的解决方案。