TourMart: A Parametric Audit Instrument for Commission Steering in LLM Travel Agents

Online travel agents (Booking, Trip.com, Expedia) have replaced ranked-list interfaces with conversational LLM agents that compress many options into one sentence of advice. Each booking earns the OTA commission and different suppliers pay different rates: the agent has a structural incentive to favor higher-margin recommendations. Whether any deployed agent does this, and by how much, no one can currently measure. Disclosure banners, conversion A/B testing, UI dark-pattern taxonomies, and generic LLM safety scores were built for older interfaces and miss the prose-recommendation surface where the steering happens. We propose TourMart, an applied intelligent-system audit instrument for LLM-OTA commission governance. Two governance levers -- lambda (gain on message-induced perception in the traveler's accept/reject decision) and kappa (budget-normalized cap on how far the message can shift perceived welfare) -- drive a paired counterfactual: holding the traveler and bundle fixed, the steering delta is read off between a commission-aware prompt and a minimum-disclosure factual template. A symmetric six-gate producer audit separates LLM-engineering failures (template collapse, refusal, internal-ID leakage) from genuine commercial steering. At deployed (lambda=1, kappa=0.05), a Qwen-14B reader shows +7.69pp steering (exact McNemar p=0.003); a Llama-3.1-8B reader shows +3.50pp in the same direction at n=143, with an extended-n supplement (n=270) confirming significance (+2.96pp, p=0.008). Across the (lambda, kappa) grid both arms pass family-wise scenario-clustered correction (p<0.001 / p=0.008). TourMart outputs a sentence a compliance report can quote: "at this deployment, 7.7 extra commission-steered recommendations per 100 paired traveler sessions."

翻译：在线旅行代理（Booking、Trip.com、Expedia）已将排序列表界面替换为会话式LLM代理，将众多选项压缩为一句建议。每次预订都能为OTA带来佣金，而不同供应商支付的费率各异：代理存在结构性动机倾向推荐高利润选项。目前无人能够衡量任意已部署代理是否存在此类行为及其程度。披露横幅、转化率A/B测试、UI暗黑模式分类法及通用LLM安全评分均为旧有界面设计，无法覆盖引导行为发生的散文式推荐层面。我们提出TourMart——一种面向LLM-OTA佣金治理的应用型智能系统审计工具。两大治理杠杆——lambda（消息诱导感知对旅行者接受/拒绝决策的增益）与kappa（消息可改变感知福利的预算归一化上限）——驱动配对反事实分析：在固定旅行者与套餐组合的条件下，通过对比含佣金感知提示词与最低披露事实模板，直接读取引导差值。对称六门生产者审计机制将LLM工程故障（模板崩塌、拒绝响应、内部ID泄露）与真实商业引导行为区分开来。在部署参数（lambda=1, kappa=0.05）下，Qwen-14B阅读器显示+7.69pp引导幅度（精确McNemar检验p=0.003）；Llama-3.1-8B阅读器在n=143样本中呈现+3.50pp同向引导，扩展样本(n=270)验证其显著性（+2.96pp, p=0.008）。在(lambda, kappa)参数网格中，两组实验均通过家族式场景聚类校正（p<0.001 / p=0.008）。TourMart输出可供合规报告引用的语句："在该部署下，每100个配对旅行者会话中，存在7.7次额外佣金引导推荐。"