Multimodal recommender systems (MMRS) leverage images, text, and interaction signals to enrich item representations. However, recent alignment based MMRSs that enforce a unified embedding space often blur modality specific structures and exacerbate ID dominance. Therefore, we propose AnchorRec, a multimodal recommendation framework that performs indirect, anchor based alignment in a lightweight projection domain. By decoupling alignment from representation learning, AnchorRec preserves each modality's native structure while maintaining cross modal consistency and avoiding positional collapse. Experiments on four Amazon datasets show that AnchorRec achieves competitive top N recommendation accuracy, while qualitative analyses demonstrate improved multimodal expressiveness and coherence. The codebase of AnchorRec is available at https://github.com/hun9008/AnchorRec.
翻译:多模态推荐系统(MMRS)利用图像、文本和交互信号来丰富物品表征。然而,近期基于对齐的MMRS强制构建统一嵌入空间,常常模糊了模态特定结构并加剧了ID主导问题。为此,我们提出AnchorRec,一种在轻量级投影域中执行间接、基于锚点的对齐的多模态推荐框架。通过将对齐与表征学习解耦,AnchorRec在保持跨模态一致性和避免位置坍缩的同时,保留了各模态的固有结构。在四个亚马逊数据集上的实验表明,AnchorRec实现了具有竞争力的Top-N推荐准确率,而定性分析则证明了其提升的多模态表达力与连贯性。AnchorRec的代码库发布于https://github.com/hun9008/AnchorRec。