Using Large Language Models (LLMs) to generate semantic features has been demonstrated as a powerful paradigm for enhancing Sequential Recommender Systems (SRS). This typically involves three stages: processing item text, extracting features with LLMs, and adapting them for downstream models. However, existing methods vary widely in prompting, architecture, and adaptation strategies, making it difficult to fairly compare design choices and identify what truly drives performance. In this work, we propose RecXplore, a modular analytical framework that decomposes the LLM-as-feature-extractor pipeline into four modules: data processing, semantic feature extraction, feature adaptation, and sequential modeling. Instead of proposing new techniques, RecXplore revisits and organizes established methods, enabling systematic exploration of each module in isolation. Experiments on four public datasets show that simply combining the best designs from existing techniques without exhaustive search yields up to 18.7% relative improvement in NDCG@5 and 12.7% in HR@5 over strong baselines. These results underscore the utility of modular benchmarking for identifying effective design patterns and promoting standardized research in LLM-enhanced recommendation.
翻译:利用大型语言模型(LLM)生成语义特征已被证明是增强序列推荐系统(SRS)的有效范式。该范式通常包含三个阶段:处理项目文本、使用LLM提取特征,以及为下游模型适配这些特征。然而,现有方法在提示设计、模型架构和适配策略上存在显著差异,导致难以公平比较不同设计方案并识别真正影响性能的关键因素。本研究提出RecXplore——一个模块化分析框架,将LLM作为特征提取器的流程分解为四个模块:数据处理、语义特征提取、特征适配和序列建模。RecXplore并非提出新技术,而是对现有方法进行系统性梳理与重组,支持对各模块进行独立探索。在四个公开数据集上的实验表明,仅通过组合现有技术中的最优设计(无需穷举搜索),即可在NDCG@5和HR@5指标上分别实现最高18.7%和12.7%的相对提升(相较于强基线)。这些结果印证了模块化基准测试在识别有效设计模式、推动LLM增强推荐领域标准化研究方面的重要价值。