In cross-border e-commerce, search relevance modeling faces the dual challenge of extreme linguistic diversity and fine-grained semantic nuances. Existing approaches typically rely on scaling up a single monolithic Large Language Model (LLM). However, our empirical analysis reveals that single models suffer from uneven capability distributions across regions. For example, excelling in English while underperforming in specific Southeast Asian languages. In this work, we shift the paradigm from scaling a single model to orchestrating heterogeneous experts. We propose a scalable Coarse-grained Mixture-of-Experts (MoE) framework that leverages the inherent complementarity of distinct open-source LLMs (e.g., Qwen, Gemma) without expensive pre-training. Unlike standard token-level MoE, our framework dynamically routes entire queries to specialized experts and, crucially, employs an Information-Preserving Concatenation Fusion strategy. We theoretically posit that preserving the distinct embedding manifolds of heterogeneous experts-rather than compressing them via weighted averaging-is essential for capturing complex relevance signals in a multi-model latent space. On datasets spanning six Southeast Asian markets, our MoE improves AUC by 0.72 percentage points over a dense baseline with the same active parameters. Meanwhile, the optimized pipeline achieves 13.72 queries per second (QPS), a 9% throughput improvement.
翻译:在跨境电子商务中,搜索相关性建模面临着极端语言多样性与细粒度语义差异的双重挑战。现有方法通常依赖于扩展单一的巨型大型语言模型(LLM)。然而,我们的实证分析表明,单一模型在不同区域间存在能力分布不均的问题。例如,在英语上表现优异,但在特定东南亚语言上却表现不佳。本研究将范式从扩展单一模型转向协同异构专家。我们提出了一种可扩展的粗粒度专家混合(MoE)框架,该框架利用不同开源LLM(如Qwen、Gemma)固有的互补性,而无需昂贵的预训练。与标准的词元级MoE不同,我们的框架将完整查询动态路由至专用专家,并关键性地采用了一种信息保持级联融合策略。我们从理论上提出,保持异构专家各自独特的嵌入流形——而非通过加权平均进行压缩——对于在多模型潜在空间中捕捉复杂相关性信号至关重要。在覆盖六个东南亚市场的数据集上,我们的MoE框架相较于具有相同激活参数的稠密基线,将AUC提升了0.72个百分点。同时,优化后的处理管线实现了每秒13.72次查询(QPS)的吞吐量,提升了9%。