Accurate dietary intake estimation is critical for informing policies and programs to support healthy eating, as malnutrition has been directly linked to decreased quality of life. However self-reporting methods such as food diaries suffer from substantial bias. Other conventional dietary assessment techniques and emerging alternative approaches such as mobile applications incur high time costs and may necessitate trained personnel. Recent work has focused on using computer vision and machine learning to automatically estimate dietary intake from food images, but the lack of comprehensive datasets with diverse viewpoints, modalities and food annotations hinders the accuracy and realism of such methods. To address this limitation, we introduce NutritionVerse-Synth, the first large-scale dataset of 84,984 photorealistic synthetic 2D food images with associated dietary information and multimodal annotations (including depth images, instance masks, and semantic masks). Additionally, we collect a real image dataset, NutritionVerse-Real, containing 889 images of 251 dishes to evaluate realism. Leveraging these novel datasets, we develop and benchmark NutritionVerse, an empirical study of various dietary intake estimation approaches, including indirect segmentation-based and direct prediction networks. We further fine-tune models pretrained on synthetic data with real images to provide insights into the fusion of synthetic and real data. Finally, we release both datasets (NutritionVerse-Synth, NutritionVerse-Real) on https://www.kaggle.com/nutritionverse/datasets as part of an open initiative to accelerate machine learning for dietary sensing.
翻译:准确的膳食摄入估算对于制定支持健康饮食的政策与计划至关重要,因为营养不良直接与生活质量下降相关。然而,食物日记等自我报告方法存在显著偏差。其他传统膳食评估技术以及移动应用等新兴替代方法不仅耗费大量时间,还可能需专业人员操作。近期研究聚焦于利用计算机视觉与机器学习自动从食物图像中估算膳食摄入量,但缺乏包含多视角、多模态及详细食物标注的综合性数据集,制约了此类方法的准确性与现实适用性。为解决这一局限,我们提出NutritionVerse-Synth——首个大规模数据集,包含84,984张逼真的合成2D食物图像及其相关膳食信息与多模态标注(包括深度图像、实例分割掩膜与语义分割掩膜)。此外,我们收集了包含889张图像(覆盖251道菜肴)的真实图像数据集NutritionVerse-Real,以评估方法的现实表现。借助这些新型数据集,我们开发并基准测试了NutritionVerse——对多种膳食摄入估算方法(包括基于间接分割的方法与直接预测网络)的实证研究。我们进一步利用真实图像微调基于合成数据预训练的模型,为合成数据与真实数据的融合提供见解。最后,我们通过开放倡议将两个数据集(NutritionVerse-Synth, NutritionVerse-Real)发布在https://www.kaggle.com/nutritionverse/datasets,以加速膳食感知领域的机器学习研究。