Forecasting Is Not Attribution: Localizing Decoder Bypass in Graph-Based Neural Marketing Mix Models

Marketing mix models are used to forecast business outcomes and to attribute those outcomes to marketing channels, but these goals are not equivalent. We study a failure mode in graph-based neural MMM called attribution bypass: a high-capacity decoder can obtain low forecasting error through target autoregression, dense communication, co-movement, context, or latent memory while failing to route counterfactual sensitivity through the graph used as the attribution object. We introduce DICE-MMM as a bounded diagnostic and training framework. We do not claim that observational neural MMM identifies causal effects. Instead, DICE separates three questions often conflated in graph-based MMM: graph recovery, forecasting accuracy, and whether the trained decoder's perturbation-induced influence is graph aligned. Stage 1 trains a graph encoder with a restricted graph-mediated decoder. Stage 2 freezes the selected encoder and trains a graph-safe latent decoder whose cross-node communication must pass through the supplied graph. Decoder use is evaluated with CIG, AR-CIG, and graph-swap tests. Across controlled R/d/T swaps and an external multi-graph rawlog stress test, DICE improves stable graph recovery over CausalMMM. The experiments show that forecasting accuracy is not an attribution certificate: in a sparse-target benchmark, no-graph and full-graph decoders achieve MSE@7 around 0.004 while AR-CIG nAUPRC remains near or below zero, whereas an oracle graph reaches 0.807 +/- 0.129 at comparable MSE. Frozen graph-swap localizes the bottleneck: the same DICE-hard-trained decoder moves from nAUPRC -0.044 +/- 0.006 under learned graph inputs to 0.894 +/- 0.027 with the oracle graph. The contribution is a stress test and failure-localization framework showing that low MSE can hide attribution bypass and that the unresolved bottleneck is graph-support selection, not forecasting or decoder capacity.

翻译：营销组合模型用于预测业务结果并将这些结果归因于营销渠道，但这两个目标并不等同。我们研究了基于图的神经MMM中一种名为“归因旁路”的失效模式：高容量解码器通过目标自回归、密集通信、共同运动、上下文或潜在记忆获得较低的预测误差，却未能将反事实敏感性路由到作为归因对象的图上。我们引入DICE-MMM作为有界诊断和训练框架。我们并未声称观测性神经MMM能识别因果效应，而是DICE将基于图MMM中常被混淆的三个问题分离开来：图恢复、预测准确性以及训练后解码器的扰动诱导影响是否与图对齐。阶段1训练一个带有受限图介导解码器的图编码器。阶段2冻结所选编码器并训练一个图安全潜在解码器，其跨节点通信必须通过提供的图进行。解码器使用通过CIG、AR-CIG和图交换测试进行评估。在受控的R/d/T交换和外部多图原始日志压力测试中，DICE相比CausalMMM改进了稳定图恢复。实验表明预测准确性并非归因凭证：在稀疏目标基准中，无图和全图解码器的MSE@7约为0.004，而AR-CIG nAUPRC仍接近或低于零，相比之下，预言图在类似MSE下达到0.807±0.129。冻结图交换定位了瓶颈：同一经DICE-hard训练的解码器从学习图输入下的nAUPRC -0.044±0.006移动到预言图下的0.894±0.027。本研究的贡献在于一个压力测试和失效定位框架，表明低MSE可能隐藏归因旁路，且未解决的瓶颈是图支持选择，而非预测或解码器能力。