The rise of chronic diseases related to diet, such as obesity and diabetes, emphasizes the need for accurate monitoring of food intake. While AI-driven dietary assessment has made strides in recent years, the ill-posed nature of recovering size (portion) information from monocular images for accurate estimation of ``how much did you eat?'' is a pressing challenge. Some 3D reconstruction methods have achieved impressive geometric reconstruction but fail to recover the crucial real-world scale of the reconstructed object, limiting its usage in precision nutrition. In this paper, we bridge the gap between 3D computer vision and digital health by proposing a method that recovers a true-to-scale 3D reconstructed object from a monocular image. Our approach leverages rich visual features extracted from models trained on large-scale datasets to estimate the scale of the reconstructed object. This learned scale enables us to convert single-view 3D reconstructions into true-to-life, physically meaningful models. Extensive experiments and ablation studies on two publicly available datasets show that our method consistently outperforms existing techniques, achieving nearly a 30% reduction in mean absolute volume-estimation error, showcasing its potential to enhance the domain of precision nutrition. Code: https://gitlab.com/viper-purdue/size-matters
翻译:与饮食相关的慢性疾病(如肥胖症和糖尿病)的日益增多,突显了对食物摄入量进行精确监测的必要性。尽管近年来人工智能驱动的膳食评估取得了进展,但从单目图像中恢复尺寸(分量)信息以准确估计“你吃了多少”这一不适定问题,仍然是一个紧迫的挑战。一些三维重建方法已实现了令人印象深刻的几何重建,但未能恢复重建对象在现实世界中的关键真实尺度,这限制了其在精准营养领域的应用。在本文中,我们通过提出一种从单目图像中恢复真实尺度三维重建对象的方法,弥合了三维计算机视觉与数字健康之间的鸿沟。我们的方法利用从大规模数据集训练的模型中提取的丰富视觉特征来估计重建对象的尺度。这种学习到的尺度使我们能够将单视图三维重建转换为逼真的、具有物理意义的模型。在两个公开可用数据集上进行的大量实验和消融研究表明,我们的方法始终优于现有技术,实现了近30%的平均绝对体积估计误差降低,展示了其在增强精准营养领域的潜力。代码:https://gitlab.com/viper-purdue/size-matters