Despite massive investments in scale, deep models for click-through rate (CTR) prediction often exhibit rapidly diminishing returns -- a stark contrast to the {predictable scaling laws} seen in large language models (LLMs). We identify the root cause as a {fundamental} \textit{structural misalignment}: {standard} Transformers assume sequential compositionality, whereas CTR data demand combinatorial reasoning over {heterogeneous} fields. To restore alignment, we introduce the \textbf{Field-Aware Transformer (FAT)}. {By reconstructing the standard Transformer block with field-centric parameters, FAT achieves \textit{structured expressivity}, {fundamentally shifting the model complexity dependence from the total vocabulary size $n$ with the number of fields $F$ ($n \gg F$).}} Crucially, to decouple model capacity from field cardinality, FAT employs a {Basis-Composed Hypernetwork} to synthesize field-specific parameters from shared bases, further reducing parameter complexity. {Theoretically, we ground this scaling behavior through a formal scaling law based on Rademacher complexity. Empirically, FAT outperforms exisiting state-of-the-art methods with up to \textbf{+4.38\%} AUC improvement, and delivers \textbf{+2.33\%} CTR and \textbf{+0.66\%} RPM in live production.} Our work establishes that scalable recommendation arises not from size alone, but from \textit{structured expressivity} -- architectural coherence with data semantics.
翻译:尽管在规模上投入了大量资源,用于点击率(CTR)预测的深度模型往往表现出快速递减的收益——这与大型语言模型(LLMs)中观察到的“可预测的缩放定律”形成鲜明对比。我们将根本原因归结为一种“根本性的结构错配”:标准Transformer假设序列组合性,而CTR数据则需要对异质字段进行组合推理。为恢复对齐,我们引入了**字段感知Transformer(FAT)**。通过用字段中心参数重构标准Transformer模块,FAT实现了“结构化表达能力”,从根本上将模型复杂度依赖从总词表大小$n$转移到字段数量$F$(其中$n \gg F$)。关键的是,为解耦模型容量与字段基数,FAT采用**基元组合超网络**从共享基元中合成字段特定参数,进一步降低参数复杂度。理论上,我们基于Rademacher复杂度通过形式化缩放定律验证了该行为。实验上,FAT以高达**+4.38%**的AUC提升超越现有最先进方法,并在实际生产中带来**+2.33%**的CTR和**+0.66%**的RPM提升。我们的工作表明,可扩展的推荐并非源于规模本身,而是源于“结构化表达能力”——即架构与数据语义的一致性。