Multimodal recommendation improves user modeling by integrating collaborative signals with heterogeneous item content. In real applications, user interests evolve over time and exhibit nonstationary dynamics, where different preference factors change at different rates. This challenge is amplified in multimodal settings because visual and textual cues can dominate decisions under different temporal regimes. Despite strong progress, most multimodal recommenders still rely on static interaction graphs or coarse temporal heuristics, which limits their ability to model continuous preference evolution with fine-grained temporal adaptation. To address these limitations, we propose TimeMM, a time-conditioned spectral filtering framework for dynamic multimodal recommendation. TimeMM instantiates Time-as-Operator by mapping interaction recency to a family of parametric temporal kernels that reweight edges on the user--item graph, producing component-specific representations without explicit eigendecomposition. To capture non-stationary interests, we introduce Adaptive Spectral Filtering that mixes the operator bank according to temporal context, yielding prediction-specific effective spectral responses. To account for modality-specific temporal sensitivity, we further propose Spectral-Aware Modality Routing that calibrates visual and textual contributions conditioned on the same temporal context. Finally, a ranking-space Spectral Diversity Regularization encourages complementary expert behaviors and prevents filter-bank collapse. Extensive experiments on real-world benchmarks demonstrate that TimeMM consistently outperforms state-of-the-art multimodal recommenders while maintaining linear-time scalability.
翻译:[translated abstract in Chinese]
多模态推荐通过融合协同信号与异构物品内容来改进用户建模。在实际应用中,用户兴趣随时间演变并呈现非平稳动态特性,其中不同偏好因素的变化速率各异。这一挑战在多模态场景中更为突出,因为视觉和文本线索在不同时间机制下可能主导决策。尽管取得了显著进展,但多数多模态推荐系统仍依赖静态交互图或粗粒度时间启发式方法,这限制了其在细粒度时间适应条件下对连续偏好演化进行建模的能力。为解决上述局限,我们提出TimeMM——一种面向动态多模态推荐的时间条件谱滤波框架。TimeMM通过将交互时效性映射至一族参数化时间核,对用户-物品图的边进行重新加权,从而无需显式特征分解即可生成组件特定表示,以此实例化"时间即算子"理念。为捕捉非平稳兴趣,我们引入自适应谱滤波,根据时间上下文混合算子库,生成预测特定的有效谱响应。针对模态特定的时间敏感性,我们进一步提出谱感知模态路由,基于相同时间上下文校准视觉与文本贡献。最后,排序空间的谱多样性正则化机制可促进互补专家行为并防止滤波器库坍缩。在真实世界基准上的大量实验表明,TimeMM在保持线性时间可扩展性的同时,持续优于现有最优多模态推荐系统。