The way customers search for and choose products is changing with the rise of large language models (LLMs). LLM-based search, or generative engines, provides direct product recommendations to users, rather than traditional online search results that require users to explore options themselves. However, these recommendations are strongly influenced by the initial retrieval order of LLMs, which disadvantages small businesses and independent creators by limiting their visibility. In this work, we propose CORE, an optimization method that \textbf{C}ontrols \textbf{O}utput \textbf{R}ankings in g\textbf{E}nerative Engines for LLM-based search. Since the LLM's interactions with the search engine are black-box, CORE targets the content returned by search engines as the primary means of influencing output rankings. Specifically, CORE optimizes retrieved content by appending strategically designed optimization content to steer the ranking of outputs. We introduce three types of optimization content: string-based, reasoning-based, and review-based, demonstrating their effectiveness in shaping output rankings. To evaluate CORE in realistic settings, we introduce ProductBench, a large-scale benchmark with 15 product categories and 200 products per category, where each product is associated with its top-10 recommendations collected from Amazon's search interface. Extensive experiments on four LLMs with search capabilities (GPT-4o, Gemini-2.5, Claude-4, and Grok-3) demonstrate that CORE achieves an average Promotion Success Rate of \textbf{91.4\% @Top-5}, \textbf{86.6\% @Top-3}, and \textbf{80.3\% @Top-1}, across 15 product categories, outperforming existing ranking manipulation methods while preserving the fluency of optimized content.
翻译:随着大型语言模型(LLM)的兴起,客户搜索和选择产品的方式正在发生变化。基于LLM的搜索(或称生成引擎)直接向用户提供产品推荐,而非传统在线搜索结果那样需要用户自行探索选项。然而,这些推荐结果受到LLM初始检索顺序的强烈影响,限制了小型企业和独立创作者的曝光度,使其处于不利地位。本研究提出CORE,一种针对基于LLM搜索的生成引擎的输出排序控制优化方法。由于LLM与搜索引擎的交互过程是黑盒的,CORE以搜索引擎返回的内容作为影响输出排序的主要手段。具体而言,CORE通过附加策略性设计的优化内容来优化检索内容,从而引导输出排序。我们提出了三种优化内容类型:基于字符串的、基于推理的和基于评论的,并验证了它们在塑造输出排序方面的有效性。为在真实场景中评估CORE,我们构建了ProductBench大规模基准数据集,包含15个产品类别,每个类别200个产品,每个产品关联从亚马逊搜索界面收集的前10条推荐。在四种具备搜索能力的LLM(GPT-4o、Gemini-2.5、Claude-4和Grok-3)上的大量实验表明,CORE在15个产品类别中平均实现了**91.4% @Top-5**、**86.6% @Top-3**和**80.3% @Top-1**的晋升成功率,在保持优化内容流畅性的同时,优于现有排序操纵方法。