Accurate nutrition estimation helps people make informed dietary choices and is essential in the prevention of serious health complications. We present NutriBench, the first publicly available natural language meal description nutrition benchmark. NutriBench consists of 11,857 meal descriptions generated from real-world global dietary intake data. The data is human-verified and annotated with macro-nutrient labels, including carbohydrates, proteins, fats, and calories. We conduct an extensive evaluation of NutriBench on the task of carbohydrate estimation, testing twelve leading Large Language Models (LLMs), including GPT-4o, Llama3.1, Qwen2, Gemma2, and OpenBioLLM models, using standard, Chain-of-Thought and Retrieval-Augmented Generation strategies. Additionally, we present a study involving professional nutritionists, finding that LLMs can provide comparable but significantly faster estimates. Finally, we perform a real-world risk assessment by simulating the effect of carbohydrate predictions on the blood glucose levels of individuals with diabetes. Our work highlights the opportunities and challenges of using LLMs for nutrition estimation, demonstrating their potential to aid professionals and laypersons and improve health outcomes. Our benchmark is publicly available at: https://mehak126.github.io/nutribench.html
翻译:准确的营养评估有助于人们做出明智的饮食选择,对于预防严重的健康并发症至关重要。我们提出了NutriBench,这是首个公开可用的自然语言膳食描述营养评估基准。NutriBench包含11,857条由真实世界全球膳食摄入数据生成的膳食描述。该数据经过人工验证,并标注了宏量营养素标签,包括碳水化合物、蛋白质、脂肪和热量。我们在碳水化合物评估任务上对NutriBench进行了广泛评估,测试了十二个领先的大语言模型,包括GPT-4o、Llama3.1、Qwen2、Gemma2以及OpenBioLLM系列模型,使用了标准、思维链和检索增强生成策略。此外,我们开展了一项涉及专业营养师的研究,发现大语言模型能够提供可比拟但显著更快的评估结果。最后,我们通过模拟碳水化合物预测对糖尿病患者血糖水平的影响,进行了真实世界的风险评估。我们的工作凸显了使用大语言模型进行营养评估的机遇与挑战,证明了其在辅助专业人士与非专业人士以及改善健康结果方面的潜力。我们的基准公开可用,地址为:https://mehak126.github.io/nutribench.html