In this work, we presented Pailitao-VL, a comprehensive multi-modal retrieval system engineered for high-precision, real-time industrial search. We here address three critical challenges in the current SOTA solution: insufficient retrieval granularity, vulnerability to environmental noise, and prohibitive efficiency-performance gap. Our primary contribution lies in two fundamental paradigm shifts. First, we transitioned the embedding paradigm from traditional contrastive learning to an absolute ID-recognition task. Through anchoring instances to a globally consistent latent space defined by billions of semantic prototypes, we successfully overcome the stochasticity and granularity bottlenecks inherent in existing embedding solutions. Second, we evolved the generative reranker from isolated pointwise evaluation to the compare-and-calibrate listwise policy. By synergizing chunk-based comparative reasoning with calibrated absolute relevance scoring, the system achieves nuanced discriminative resolution while circumventing the prohibitive latency typically associated with conventional reranking methods. Extensive offline benchmarks and online A/B tests on Alibaba e-commerce platform confirm that Pailitao-VL achieves state-of-the-art performance and delivers substantial business impact. This work demonstrates a robust and scalable path for deploying advanced MLLM-based retrieval architectures in demanding, large-scale production environments.
翻译:本文提出了Pailitao-VL,一个为高精度、实时工业搜索设计的综合性多模态检索系统。我们解决了当前SOTA方案中的三个关键挑战:检索粒度不足、对环境噪声的脆弱性以及难以接受的效率-性能差距。我们的主要贡献在于两个根本性的范式转变。首先,我们将嵌入范式从传统的对比学习转变为绝对的ID识别任务。通过将实例锚定在由数十亿语义原型定义的全局一致潜在空间中,我们成功克服了现有嵌入解决方案固有的随机性和粒度瓶颈。其次,我们将生成式重排序器从孤立的逐点评估演进为比较-校准的列表策略。通过将基于分块的比较推理与校准的绝对相关性评分相结合,系统实现了精细的判别分辨率,同时规避了传统重排序方法通常伴随的难以接受的延迟。在阿里巴巴电商平台上进行的广泛离线基准测试和在线A/B测试证实,Pailitao-VL实现了最先进的性能并带来了显著的商业影响。这项工作为在要求苛刻的大规模生产环境中部署基于MLLM的先进检索架构展示了一条稳健且可扩展的路径。