Dietary assessment is essential to maintaining a healthy lifestyle. Automatic image-based dietary assessment is a growing field of research due to the increasing prevalence of image capturing devices (e.g. mobile phones). In this work, we estimate food energy from a single monocular image, a difficult task due to the limited hard-to-extract amount of energy information present in an image. To do so, we employ an improved encoder-decoder framework for energy estimation; the encoder transforms the image into a representation embedded with food energy information in an easier-to-extract format, which the decoder then extracts the energy information from. To implement our method, we compile a high-quality food image dataset verified by registered dietitians containing eating scene images, food-item segmentation masks, and ground truth calorie values. Our method improves upon previous caloric estimation methods by over 10\% and 30 kCal in terms of MAPE and MAE respectively.
翻译:饮食评估对于维持健康生活方式至关重要。随着图像捕捉设备(如手机)的日益普及,基于图像的自动饮食评估成为一个蓬勃发展的研究领域。本文旨在从单目图像中估计食物能量,由于图像中蕴含的能量信息有限且难以提取,这是一项具有挑战性的任务。为此,我们采用了一种改进的编码器-解码器框架进行能量估计;编码器将图像转换为一种嵌入食物能量信息且更易提取的表示形式,随后解码器从中提取能量信息。为实施我们的方法,我们构建了一个经注册营养师验证的高质量食物图像数据集,包含用餐场景图像、食物项分割掩码及真实卡路里值。与先前的热量估计方法相比,我们的方法在平均绝对百分比误差(MAPE)和平均绝对误差(MAE)上分别提升了超过10%和30千卡。