Generative Search Engines (GSEs) synthesize conversational answers from multiple sources, weakening the long-standing link between search ranking and digital visibility. This shift raises a central question for content creators: How can we reliably quantify a source article's influence on a GSE's synthesized answer across diverse intents and follow-up questions? We introduce CC-GSEO-Bench, a content-centric benchmark that couples a large-scale dataset with a creator-centered evaluation framework. The dataset contains over 1,000 source articles and over 5,000 query-article pairs, organized in a one-to-many structure for article-level evaluation. We ground construction in realistic retrieval by combining seed queries from public QA datasets with limited synthesized augmentation and retaining only queries whose paired source reappears in a follow-up retrieval step. On top of this dataset, we operationalize influence along three core dimensions: Exposure, Faithful Credit, and Causal Impact, and two content-quality dimensions: Readability and Structure, and Trustworthiness and Safety. We aggregate query-level signals over each article's query cluster to summarize influence strength, coverage, and stability, and empirically characterize influence dynamics across representative content patterns.
翻译:生成式搜索引擎(GSEs)能够综合来自多个来源的对话式答案,这削弱了长期以来搜索排名与数字可见性之间的紧密联系。这一转变向内容创作者提出了一个核心问题:我们如何能可靠地量化一篇来源文章在不同搜索意图和后续追问下对GSE合成答案的影响力?我们引入了CC-GSEO-Bench,这是一个以内容为中心的基准,它将一个大规模数据集与一个以创作者为中心的评价框架相结合。该数据集包含超过1,000篇来源文章和超过5,000个查询-文章对,并以一对多的结构组织,用于文章级别的评估。我们通过将来自公开问答数据集的种子查询与有限的合成增强相结合,并仅保留那些其配对来源在后续检索步骤中重新出现的查询,从而将构建过程建立在现实的检索场景之上。在此数据集基础上,我们沿着三个核心维度——曝光度、忠实贡献度和因果影响,以及两个内容质量维度——可读性与结构、可信度与安全性,来具体衡量影响力。我们将每个文章对应查询簇中的查询级别信号进行聚合,以总结其影响力强度、覆盖范围和稳定性,并通过实证研究刻画了代表性内容模式下的影响力动态。