Dietary assessment is a key contributor to monitoring health status. Existing self-report methods are tedious and time-consuming with substantial biases and errors. Image-based food portion estimation aims to estimate food energy values directly from food images, showing great potential for automated dietary assessment solutions. Existing image-based methods either use a single-view image or incorporate multi-view images and depth information to estimate the food energy, which either has limited performance or creates user burdens. In this paper, we propose an end-to-end deep learning framework for food energy estimation from a monocular image through 3D shape reconstruction. We leverage a generative model to reconstruct the voxel representation of the food object from the input image to recover the missing 3D information. Our method is evaluated on a publicly available food image dataset Nutrition5k, resulting a Mean Absolute Error (MAE) of 40.05 kCal and Mean Absolute Percentage Error (MAPE) of 11.47% for food energy estimation. Our method uses RGB image as the only input at the inference stage and achieves competitive results compared to the existing method requiring both RGB and depth information.
翻译:饮食评估是监测健康状况的关键手段。现有自我报告方法繁琐耗时,且存在显著偏差与误差。基于图像的食物份量估计旨在直接从食物图像中估算能量值,为自动化饮食评估解决方案展现出巨大潜力。现有基于图像的方法或采用单视角图像,或通过多视角图像与深度信息进行估算,但前者性能有限,后者又会增加用户负担。本文提出一种端到端深度学习框架,通过单目图像进行三维形状重建以估计食物能量值。我们利用生成模型从输入图像重建食物对象的体素表示,从而恢复缺失的三维信息。该方法在公开食物图像数据集Nutrition5k上进行了评估,食物能量估计的平均绝对误差(MAE)为40.05 kCal,平均绝对百分比误差(MAPE)为11.47%。本方法在推理阶段仅以RGB图像作为输入,即可达到与需要RGB和深度信息的现有方法相媲美的竞争性结果。