The advent of large language models (LLMs) has ushered in a new paradigm of search engines that use generative models to gather and summarize information to answer user queries. This emerging technology, which we formalize under the unified framework of Generative Engines (GEs), has the potential to generate accurate and personalized responses, and is rapidly replacing traditional search engines like Google and Bing. Generative Engines typically satisfy queries by synthesizing information from multiple sources and summarizing them with the help of LLMs. While this shift significantly improves \textit{user} utility and \textit{generative search engine} traffic, it results in a huge challenge for the third stakeholder -- website and content creators. Given the black-box and fast-moving nature of Generative Engines, content creators have little to no control over when and how their content is displayed. With generative engines here to stay, the right tools should be provided to ensure that creator economy is not severely disadvantaged. To address this, we introduce Generative Engine Optimization (GEO), a novel paradigm to aid content creators in improving the visibility of their content in Generative Engine responses through a black-box optimization framework for optimizing and defining visibility metrics. We facilitate systematic evaluation in this new paradigm by introducing GEO-bench, a benchmark of diverse user queries across multiple domains, coupled with sources required to answer these queries. Through rigorous evaluation, we show that GEO can boost visibility by up to 40\% in generative engine responses. Moreover, we show the efficacy of these strategies varies across domains, underscoring the need for domain-specific methods. Our work opens a new frontier in the field of information discovery systems, with profound implications for generative engines and content creators.
翻译:大型语言模型(LLMs)的出现开创了搜索引擎的新范式,这类引擎利用生成式模型收集并总结信息以回答用户查询。我们将这一新兴技术统一归纳为生成式引擎框架,该技术有望生成精准且个性化的响应,并正迅速取代谷歌、必应等传统搜索引擎。生成式引擎通常通过整合多源信息并借助LLMs进行摘要来满足查询需求。这一转变虽显著提升了用户效用与生成式搜索引擎流量,却给第三方利益相关者——网站与内容创作者带来了巨大挑战。鉴于生成式引擎的黑箱特性与快速迭代性,内容创作者几乎无法掌控其内容被展示的时机与方式。在生成式引擎持续发展的背景下,亟需提供合适工具以确保创作者经济不处于严重劣势。为此,我们提出生成式引擎优化(GEO),这是一种通过黑箱优化框架定义并优化可见性指标,以帮助内容创作者提升其内容在生成式引擎响应中可见度的新型范式。我们通过构建跨多领域的多样化用户查询基准GEO-bench,并配套查询所需的数据源,推动该范式下系统性评估的发展。严格评估表明,GEO可将生成式引擎响应中的内容可见性提升高达40%。此外,不同域间优化策略的有效性存在差异,这凸显了领域特异性方法的必要性。本工作为信息发现系统领域开辟了新前沿,对生成式引擎与内容创作者具有深远意义。