Meta-Modal Agent: Sequential Evidence Routing for Missing-Modality Candidate Reranking

Missing modalities cause severe failures in multimodal recommender systems. User histories, item text, and visual evidence are frequently absent during cold-start scenarios, exactly when recommendation quality matters most. Existing approaches recover absent signals through imputation, feature propagation, or generative reconstruction, but these strategies can inject unsupported evidence when the surviving signals are weak. We introduce the Meta-Modal Agent (MMA), a large language model based candidate-pool reranker that treats missingness as a sequential evidence-routing problem. MMA is trained with balanced missingness-task reinforcement learning over masked-modality episodes and is evaluated in two variants: MMA-Auto, which uses only automated text, image, and graph tools, and MMA-Interactive, which additionally permits clarification questions grounded in surviving modalities as an upper-bound diagnostic. MMA operates after a first-stage retriever has produced a candidate pool; it scores those candidates rather than retrieving items from the full catalog. Final reranking fuses MMA scores with first-stage retrieval scores selected on validation data. Our evaluation is organized around four evidence checks required for a robust missing-modality claim: oracle-free one-observed-modality availability (OOMA) robustness, per-modality OOMA breakdowns, fixed-pool full-catalog reranking, and a deterministic-router mechanism control. MMA-Auto improves target-positive OOMA NDCG@10 by 4.0% and fixed-pool full-catalog reranking NDCG@10 by 12.7% over the strongest non-interactive baseline. RuleRouter-Fuse, which uses the same tools and fusion rule without learned policy updates, underperforms MMA-Auto, supporting learned routing beyond deterministic tool fusion. MMA-Interactive adds a 4.1% upper-bound gain when clarification is available.

翻译：缺失模态会导致多模态推荐系统出现严重故障。在冷启动场景中——这正是推荐质量最为关键的时期——用户历史记录、物品文本和视觉证据常常缺失。现有方法通过插补、特征传播或生成式重建来恢复缺失信号，但当留存信号较弱时，这些策略可能注入无证据支持的信息。我们提出元模态智能体（MMA），一种基于大语言模型的候选池重排序器，将缺失问题建模为序列证据路由任务。MMA通过掩码模态情节上的平衡缺失任务强化学习进行训练，并评估两种变体：MMA-Auto（仅使用自动化文本、图像和图工具）和MMA-Interactive（额外允许基于留存模态的澄清提问，作为上界诊断）。MMA在第一阶段检索器生成候选池后运作，它对候选进行评分而非从全量目录中检索物品。最终重排序融合MMA得分与基于验证数据选出的第一阶段检索得分。我们的评估围绕四项证据检查展开，这些检查是验证缺失模态鲁棒性声明所必需的：无假说单观测模态可用性（OOMA）鲁棒性、逐模态OOMA分解、固定池全目录重排序以及确定性路由器机制控制。相比于最强的非交互式基线，MMA-Auto在目标正向OOMA的NDCG@10指标上提升4.0%，在固定池全目录重排序NDCG@10上提升12.7%。RuleRouter-Fuse（使用相同工具和融合规则但未采用学习策略更新）表现劣于MMA-Auto，这支持了超越确定性工具融合的学习型路由。当提供澄清机制时，MMA-Interactive带来4.1%的上界增益。