TraveLLaMA：具备大规模数据集与结构化推理能力的多模态旅行助手 (TraveLLaMA: A Multimodal Travel Assistant with Large-Scale Dataset and Structured Reasoning)

Tourism and travel planning increasingly rely on digital assistance, yet existing multimodal AI systems often lack specialized knowledge and contextual understanding of urban environments. We present TraveLLaMA, a specialized multimodal language model designed for comprehensive travel assistance. Our work addresses the fundamental challenge of developing practical AI travel assistants through three key contributions: (1) TravelQA, a novel dataset of 265k question-answer pairs combining 160k text QA from authentic travel sources, 100k vision-language QA featuring maps and location imagery, and 5k expert-annotated Chain-of-Thought reasoning examples; (2) Travel-CoT, a structured reasoning framework that decomposes travel queries into spatial, temporal, and practical dimensions, improving answer accuracy by 10.8\% while providing interpretable decision paths; and (3) an interactive agent system validated through extensive user studies. Through fine-tuning experiments on state-of-the-art vision-language models (LLaVA, Qwen-VL, Shikra), we achieve 6.2-9.4\% base improvements, further enhanced by Travel-CoT reasoning. Our model demonstrates superior capabilities in contextual travel recommendations, map interpretation, and scene understanding while providing practical information such as operating hours and cultural insights. User studies with 500 participants show TraveLLaMA achieves a System Usability Scale score of 82.5, significantly outperforming general-purpose models and establishing new standards for multimodal travel assistance systems.

翻译：旅游业与行程规划日益依赖数字辅助，然而现有的多模态人工智能系统往往缺乏对城市环境的专业知识与情境理解。本文提出TraveLLaMA——一个专为全方位旅行辅助设计的专业化多模态语言模型。本研究通过三项核心贡献应对开发实用化AI旅行助手的基础性挑战：（1）TravelQA，一个包含26.5万个问答对的新型数据集，整合了来自真实旅行源的16万个文本问答、10万个包含地图与位置图像的视觉语言问答，以及5千个专家标注的思维链推理示例；（2）Travel-CoT，一种将旅行查询分解为空间、时间及实用维度的结构化推理框架，在提供可解释决策路径的同时将答案准确率提升10.8%；（3）经过大规模用户研究验证的交互式智能体系统。通过对前沿视觉语言模型（LLaVA、Qwen-VL、Shikra）进行微调实验，我们实现了6.2%-9.4%的基础性能提升，并经由Travel-CoT推理机制进一步强化。该模型在情境化旅行推荐、地图解析与场景理解方面展现出卓越能力，同时能提供营业时间、文化洞察等实用信息。针对500名参与者的用户研究表明，TraveLLaMA获得82.5分的系统可用性量表评分，显著优于通用模型，为多模态旅行辅助系统确立了新标准。