By employing large language models (LLMs) to retrieve documents and generate natural language responses, Generative Engines, such as Google AI overview and ChatGPT, provide significantly enhanced user experiences and have rapidly become the new form of search. Their rapid adoption also drives the needs of Generative Engine Optimization (GEO), as content providers are eager to gain more traction from them. In this paper, we introduce AutoGEO, a framework to automatically learn generative engine preferences when using retrieved contents for response generation, and rewrite web contents for more such traction. AutoGEO first prompts frontier LLMs to explain generative engine preferences and extract meaningful preference rules from these explanations. Then it uses preference rules as context engineering for AutoGEO$_\text{API}$, a prompt-based GEO system, and as rule-based rewards to train AutoGEO$_\text{Mini}$, a cost-effective GEO model. Experiments on the standard GEO-Bench and two newly constructed benchmarks using real user queries demonstrate the effectiveness of AutoGEO in enhancing content traction while preserving search utility. Analyses confirm the learned rules' robustness and abilities to capture unique preferences in variant domains, and AutoGEO systems' ability to embed them in content optimization. The code is released at https://github.com/cxcscmu/AutoGEO.
翻译:通过利用大型语言模型(LLMs)检索文档并生成自然语言响应,生成式搜索引擎(如Google AI概览和ChatGPT)显著提升了用户体验,并迅速成为搜索的新形态。其快速普及也催生了生成式搜索引擎优化(GEO)的需求,因为内容提供者渴望从中获得更多关注。本文介绍了AutoGEO,一个能够自动学习生成式引擎在使用检索内容生成响应时的偏好、并据此重写网络内容以获取更多关注的框架。AutoGEO首先提示前沿LLMs解释生成式引擎的偏好,并从这些解释中提取有意义的偏好规则。随后,它将这些偏好规则用作AutoGEO$_\text{API}$(一个基于提示的GEO系统)的上下文工程,并作为基于规则的奖励来训练AutoGEO$_\text{Mini}$(一个高性价比的GEO模型)。在标准GEO-Bench以及使用真实用户查询构建的两个新基准上的实验表明,AutoGEO在提升内容关注度的同时,能有效保持搜索效用。分析证实了所学规则的鲁棒性及其在不同领域中捕捉独特偏好的能力,以及AutoGEO系统将这些规则嵌入内容优化的能力。代码发布于https://github.com/cxcscmu/AutoGEO。