Accurate nutrition estimation helps people make informed dietary choices and is essential in the prevention of serious health complications. We present NutriBench, the first publicly available natural language meal description nutrition benchmark. NutriBench consists of 11,857 meal descriptions generated from real-world global dietary intake data. The data is human-verified and annotated with macro-nutrient labels, including carbohydrates, proteins, fats, and calories. We conduct an extensive evaluation of NutriBench on the task of carbohydrate estimation, testing twelve leading Large Language Models (LLMs), including GPT-4o, Llama3.1, Qwen2, Gemma2, and OpenBioLLM models, using standard, Chain-of-Thought and Retrieval-Augmented Generation strategies. Additionally, we present a study involving professional nutritionists, finding that LLMs can provide more accurate and faster estimates. Finally, we perform a real-world risk assessment by simulating the effect of carbohydrate predictions on the blood glucose levels of individuals with diabetes. Our work highlights the opportunities and challenges of using LLMs for nutrition estimation, demonstrating their potential to aid professionals and laypersons and improve health outcomes. Our benchmark is publicly available at: https://mehak126.github.io/nutribench.html
翻译:准确的营养估算有助于人们做出明智的饮食选择,对于预防严重健康并发症至关重要。我们提出了NutriBench,这是首个公开可用的自然语言膳食描述营养基准。NutriBench包含11,857条基于真实世界全球膳食摄入数据生成的膳食描述。该数据经过人工验证,并标注了宏量营养素标签,包括碳水化合物、蛋白质、脂肪和热量。我们在碳水化合物估算任务上对NutriBench进行了广泛评估,测试了十二个领先的大语言模型(LLMs),包括GPT-4o、Llama3.1、Qwen2、Gemma2以及OpenBioLLM系列模型,并采用了标准、思维链和检索增强生成策略。此外,我们开展了一项涉及专业营养师的研究,发现LLMs能够提供更准确、更快速的估算结果。最后,我们通过模拟碳水化合物预测对糖尿病患者血糖水平的影响,进行了实际风险评估。我们的工作凸显了使用LLMs进行营养估算的机遇与挑战,证明了其在辅助专业人士与普通用户、改善健康结果方面的潜力。我们的基准已公开提供:https://mehak126.github.io/nutribench.html