We introduce PeReGrINE, a benchmark and evaluation framework for personalized review generation grounded in graph-structured user--item evidence. PeReGrINE restructures Amazon Reviews 2023 into a temporally consistent bipartite graph, where each target review is conditioned on bounded evidence from user history, item context, and neighborhood interactions under explicit temporal cutoffs. To represent persistent user preferences without conditioning directly on sparse raw histories, we compute a User Style Parameter that summarizes each user's linguistic and affective tendencies over prior reviews. This setup supports controlled comparison of four graph-derived retrieval settings: product-only, user-only, neighbor-only, and combined evidence. Beyond standard generation metrics, we introduce Dissonance Analysis, a macro-level evaluation framework that measures deviation from expected user style and product-level consensus. We also study visual evidence as an auxiliary context source and find that it can improve textual quality in some settings, while graph-derived evidence remains the main driver of personalization and consistency. Across product categories, PeReGrINE offers a reproducible way to study how evidence composition affects review fidelity, personalization, and grounding in retrieval-conditioned language models.
翻译:我们提出PeReGrINE,这是一个基于图结构用户-项目证据进行个性化评论生成的基准与评估框架。PeReGrINE将2023年亚马逊评论数据集重构为时间一致性的二分图,其中每条目标评论在显式时间截断条件下,受限于用户历史、项目上下文及邻域交互的有界证据。为避免直接依赖稀疏原始历史记录来表征持久用户偏好,我们通过计算用户风格参数,归纳每位用户在过往评论中的语言与情感倾向。该框架支持对四种图派生检索设置进行受控比较:仅项目、仅用户、仅邻域以及组合证据。除标准生成指标外,我们引入不和谐分析——一种宏观层面的评估框架,用于衡量与预期用户风格和产品层面共识的偏差。我们还研究了视觉证据作为辅助上下文来源的作用,发现其在某些场景下能提升文本质量,而图派生证据仍是实现个性化与一致性的核心驱动因素。跨产品类别的实验表明,PeReGrINE为研究证据组合如何影响检索条件语言模型中的评论真实性、个性化及可信性提供了可复现的途径。