Food portion estimation is crucial for monitoring health and tracking dietary intake. Image-based dietary assessment, which involves analyzing eating occasion images using computer vision techniques, is increasingly replacing traditional methods such as 24-hour recalls. However, accurately estimating the nutritional content from images remains challenging due to the loss of 3D information when projecting to the 2D image plane. Existing portion estimation methods are challenging to deploy in real-world scenarios due to their reliance on specific requirements, such as physical reference objects, high-quality depth information, or multi-view images and videos. In this paper, we introduce MFP3D, a new framework for accurate food portion estimation using only a single monocular image. Specifically, MFP3D consists of three key modules: (1) a 3D Reconstruction Module that generates a 3D point cloud representation of the food from the 2D image, (2) a Feature Extraction Module that extracts and concatenates features from both the 3D point cloud and the 2D RGB image, and (3) a Portion Regression Module that employs a deep regression model to estimate the food's volume and energy content based on the extracted features. Our MFP3D is evaluated on MetaFood3D dataset, demonstrating its significant improvement in accurate portion estimation over existing methods.
翻译:食物分量估计对于健康监测和膳食摄入追踪至关重要。基于图像的膳食评估,即利用计算机视觉技术分析进食场景图像,正日益取代24小时回顾法等传统方法。然而,由于从三维空间投影到二维图像平面时丢失了深度信息,从图像中准确估算营养成分仍然具有挑战性。现有的分量估计方法由于依赖特定条件(如物理参照物、高质量深度信息或多视角图像与视频),难以在实际场景中部署。本文提出MFP3D,一种仅需单张单目图像即可实现准确食物分量估计的新框架。具体而言,MFP3D包含三个关键模块:(1) 三维重建模块:从二维图像生成食物的三维点云表示;(2) 特征提取模块:从三维点云和二维RGB图像中提取并拼接特征;(3) 分量回归模块:采用深度回归模型,基于提取的特征估计食物的体积与能量含量。我们在MetaFood3D数据集上评估了MFP3D,结果表明其在准确分量估计方面较现有方法有显著提升。