Dietary assessment is essential to maintaining a healthy lifestyle. Automatic image-based dietary assessment is a growing field of research due to the increasing prevalence of image capturing devices (e.g. mobile phones). In this work, we estimate food energy from a single monocular image, a difficult task due to the limited hard-to-extract amount of energy information present in an image. To do so, we employ an improved encoder-decoder framework for energy estimation; the encoder transforms the image into a representation embedded with food energy information in an easier-to-extract format, which the decoder then extracts the energy information from. To implement our method, we compile a high-quality food image dataset verified by registered dietitians containing eating scene images, food-item segmentation masks, and ground truth calorie values. Our method improves upon previous caloric estimation methods by over 10\% and 30 kCal in terms of MAPE and MAE respectively.
翻译:饮食评估对于维持健康生活方式至关重要。随着图像采集设备(例如手机)的日益普及,基于图像的自动饮食评估成为一个不断发展的研究领域。在本研究中,我们从单个单目图像中估计食物能量,这是一项艰巨的任务,因为图像中包含的能量信息有限且难以提取。为此,我们采用了一种改进的编码器-解码器框架进行能量估计:编码器将图像转换为一种嵌入食物能量信息且更易于提取格式的表示,然后解码器从中提取能量信息。为了实施我们的方法,我们汇编了一个由注册营养师验证的高质量食物图像数据集,包含进食场景图像、食物项分割掩码和真实热量值。与之前的卡路里估计方法相比,我们的方法在MAPE和MAE上分别提高了超过10%和30 kCal。