Conversational Recommender Systems (CRSs) have attracted growing attention for their ability to deliver personalized recommendations through natural language interactions. To more accurately infer user preferences from multi-turn conversations, recent works increasingly expand conversational context (e.g., by incorporating diverse entity information or retrieving related dialogues). While such context enrichment can assist preference modeling, it also introduces longer and more heterogeneous inputs, leading to practical issues such as input length constraints, text style inconsistency, and irrelevant textual noise, thereby raising the demand for stronger language understanding ability. In this paper, we propose STARCRS, a Screen-Text-AwaRe Conversational Recommender System that integrates two complementary text understanding modes: (1) a screen-reading pathway that encodes auxiliary textual information as visual tokens, mimicking skim reading on a screen, and (2) an LLM-based textual pathway that focuses on a limited set of critical content for fine-grained reasoning. We design a knowledge-anchored fusion framework that combines contrastive alignment, cross-attention interaction, and adaptive gating to integrate the two modes for improved preference modeling and response generation. Extensive experiments on two widely used benchmarks demonstrate that STARCRS consistently improves both recommendation accuracy and generated response quality.
翻译:对话式推荐系统(CRS)因其能够通过自然语言交互提供个性化推荐而受到越来越多的关注。为了从多轮对话中更准确地推断用户偏好,近期研究不断扩展对话上下文(例如通过整合多样化实体信息或检索相关对话)。虽然这种上下文丰富化有助于偏好建模,但同时也引入了更长且更异质的输入,导致诸如输入长度限制、文本风格不一致及无关文本噪声等实际问题,从而对更强的语言理解能力提出了需求。本文提出STARCRS——一种屏幕文本感知的对话式推荐系统,它整合了两种互补的文本理解模式:(1)屏幕阅读通路,将辅助文本信息编码为视觉标记,模拟屏幕浏览式阅读;(2)基于大语言模型的文本通路,专注于有限的关键内容以进行细粒度推理。我们设计了一个知识锚定的融合框架,结合对比对齐、交叉注意力交互和自适应门控机制,将两种模式相集成以改进偏好建模与响应生成。在两个广泛使用的基准数据集上的大量实验表明,STARCRS在推荐准确性和生成响应质量方面均取得持续提升。