Generative search engines increasingly determine whether online information is merely discoverable, cited as a source, or actually absorbed into generated answers. This paper proposes a two-stage measurement framework for Generative Engine Optimization (GEO): citation selection, where a platform triggers search and chooses sources, and citation absorption, where a cited page contributes language, evidence, structure, or factual support to the final answer. We analyze the public geo-citation-lab dataset covering 602 controlled prompts across ChatGPT, Google AI Overview/Gemini, and Perplexity; 21,143 valid search-layer citations; 23,745 citation-level feature records; 18,151 successfully fetched pages; and 72 extracted features. The central descriptive finding is that citation breadth and citation depth diverge. Perplexity and Google cite more sources on average, while ChatGPT cites fewer sources but shows substantially higher average citation influence among fetched pages. High-influence pages tend to be longer, more structured, semantically aligned, and richer in extractable evidence such as definitions, numerical facts, comparisons, and procedural steps. The results suggest that GEO should be measured beyond citation counts, with answer-level absorption treated as a separate outcome.
翻译:生成式搜索引擎日益决定在线信息是否仅被检索、被引用为来源,还是被实际吸收进生成的答案中。本文提出一个面向生成式引擎优化(GEO)的两阶段测量框架:引文选择——平台触发搜索并选择来源的阶段,以及引文吸收——被引用页面为最终答案提供语言、证据、结构或事实支撑的阶段。我们分析了公开的geo-citation-lab数据集,该数据集涵盖ChatGPT、Google AI Overview/Gemini和Perplexity共602个受控提示、21,143条有效搜索层引文、23,745条引文级特征记录、18,151个成功获取页面及72个提取特征。核心描述性发现是:引文广度与引文深度呈现分化。Perplexity和Google平均引用更多来源,而ChatGPT引用来源较少,但在已获取页面中平均引文影响力显著更高。高影响力页面往往更长、结构更清晰、语义匹配度更高,且包含更丰富的可提取证据,如定义、数值事实、比较关系和步骤性描述。结果表明,GEO的测量应超越引文计数,将答案级吸收作为独立结果变量处理。