Standardizing food terms from product labels and menus into ontology concepts is a prerequisite for trustworthy dietary assessment and safety reporting. The dominant approach to Named Entity Linking (NEL) in the food and nutrition domains fine-tunes Large Language Models (LLMs) on task-specific corpora. Although effective, fine-tuning incurs substantial computational cost, ties models to a particular ontology snapshot (i.e., version), and degrades under ontology drift. This paper presents FoodOntoRAG, a model- and ontology-agnostic pipeline that performs few-shot NEL by retrieving candidate entities from domain ontologies and conditioning an LLM on structured evidence (food labels, synonyms, definitions, and relations). A hybrid lexical--semantic retriever enumerates candidates; a selector agent chooses a best match with rationale; a separate scorer agent calibrates confidence; and, when confidence falls below a threshold, a synonym generator agent proposes reformulations to re-enter the loop. The pipeline approaches state-of-the-art accuracy while revealing gaps and inconsistencies in existing annotations. The design avoids fine-tuning, improves robustness to ontology evolution, and yields interpretable decisions through grounded justifications.
翻译:将产品标签和菜单中的食品术语标准化为本体概念,是进行可信膳食评估与安全报告的前提。食品与营养领域中命名实体链接的主流方法是在特定任务语料上对大语言模型进行微调。尽管有效,微调会带来高昂的计算成本,将模型绑定至特定的本体快照(即版本),且在本体漂移下性能会下降。本文提出FoodOntoRAG,一个模型与本体无关的流程,通过从领域本体中检索候选实体,并基于结构化证据(食品标签、同义词、定义及关系)对大语言模型进行条件化,实现少样本命名实体链接。一个混合词汇-语义检索器枚举候选实体;一个选择器智能体基于推理选择最佳匹配;一个独立的评分器智能体校准置信度;当置信度低于阈值时,一个同义词生成器智能体提出重新表述以重新进入循环。该流程在接近最先进准确率的同时,揭示了现有标注中的差距与不一致性。其设计避免了微调,提升了对本体演化的鲁棒性,并通过基于证据的论证产生了可解释的决策。