Accurate nutrition estimation helps people make informed dietary choices and is essential in the prevention of serious health complications. We present NutriBench, the first publicly available natural language meal description nutrition benchmark. NutriBench consists of 11,857 meal descriptions generated from real-world global dietary intake data. The data is human-verified and annotated with macro-nutrient labels, including carbohydrates, proteins, fats, and calories. We conduct an extensive evaluation of NutriBench on the task of carbohydrate estimation, testing twelve leading Large Language Models (LLMs), including GPT-4o, Llama3.1, Qwen2, Gemma2, and OpenBioLLM models, using standard, Chain-of-Thought and Retrieval-Augmented Generation strategies. Additionally, we present a study involving professional nutritionists, finding that LLMs can provide more accurate and faster estimates. Finally, we perform a real-world risk assessment by simulating the effect of carbohydrate predictions on the blood glucose levels of individuals with diabetes. Our work highlights the opportunities and challenges of using LLMs for nutrition estimation, demonstrating their potential to aid professionals and laypersons and improve health outcomes. Our benchmark is publicly available at: https://mehak126.github.io/nutribench.html
翻译:准确的营养估算有助于人们做出明智的饮食选择,对于预防严重健康并发症至关重要。本文提出NutriBench,首个公开可用的自然语言膳食描述营养评估基准。NutriBench包含11,857条基于真实世界全球膳食摄入数据生成的膳食描述。该数据经过人工验证,并标注了宏量营养素标签,包括碳水化合物、蛋白质、脂肪和热量。我们在碳水化合物估算任务上对NutriBench进行了广泛评估,测试了十二个领先的大语言模型(包括GPT-4o、Llama3.1、Qwen2、Gemma2及OpenBioLLM系列模型),采用了标准、思维链和检索增强生成三种策略。此外,我们开展了一项专业营养师参与的研究,发现大语言模型能够提供更准确且更快速的估算。最后,我们通过模拟糖尿病患者血糖水平对碳水化合物预测的响应,进行了实际风险评估。本研究揭示了使用大语言模型进行营养估算的机遇与挑战,证明了其在辅助专业人员与普通民众、改善健康结果方面的潜力。本基准数据集已公开于:https://mehak126.github.io/nutribench.html