New findings in natural language processing (NLP) demonstrate that the strong memorization capability contributes a lot to the success of Large Language Models (LLM). This inspires us to explicitly bring an independent memory mechanism into CTR ranking model to learn and memorize cross features' representations. In this paper, we propose multi-Hash Codebook NETwork (HCNet) as the memory mechanism for efficiently learning and memorizing representations of cross features in CTR tasks. HCNet uses a multi-hash codebook as the main memory place and the whole memory procedure consists of three phases: multi-hash addressing, memory restoring, and feature shrinking. We also propose a new CTR model named MemoNet which combines HCNet with a DNN backbone. Extensive experimental results on three public datasets and online test show that MemoNet reaches superior performance over state-of-the-art approaches. Besides, MemoNet shows scaling law of large language model in NLP, which means we can enlarge the size of the codebook in HCNet to sustainably obtain performance gains. Our work demonstrates the importance and feasibility of learning and memorizing representations of cross features, which sheds light on a new promising research direction.
翻译:自然语言处理(NLP)领域的最新发现表明,强大的记忆能力对大型语言模型(LLM)的成功贡献显著。这启发我们显式地将独立的记忆机制引入CTR排序模型,以学习和记忆交叉特征的表示。本文提出多哈希码本网络(HCNet)作为记忆机制,用于在CTR任务中高效学习和记忆交叉特征的表示。HCNet以多哈希码本作为主要记忆存储单元,整个记忆过程包含三个阶段:多哈希寻址、记忆恢复和特征压缩。我们还提出了一种新的CTR模型——MemoNet,它将HCNet与DNN骨干网络相结合。在三个公开数据集及在线测试上的大量实验结果表明,MemoNet的性能优于现有最先进方法。此外,MemoNet展现了NLP中大语言模型的规模缩放规律,即我们可以扩大HCNet中码本的规模,从而持续获得性能提升。本工作证明了学习和记忆交叉特征表示的重要性与可行性,为这一有前景的新研究方向提供了启示。