Predicting customers' long-term revenue from sparse and irregular transaction data is central to marketing resource allocation in non-contractual settings, yet existing approaches face a trade-off. Traditional probabilistic customer base models deliver robust long-horizon forecasts by imposing strong structural assumptions, while flexible machine-learning models often require substantial training data and careful tuning. We propose a variational-autoencoder-based model that preserves the process-based likelihood of established attrition-transaction-spend models conditional on customer heterogeneity, but replaces the restrictive parametric mixing distribution with a flexible latent representation learned by encoder-decoder networks. The resulting approach (i) provides a single model for customer attrition, transactions and spending, (ii) remains reliable when contextual covariates are unavailable, and (iii) flexibly incorporates rich covariates and nonlinear effects when they are available. This design balances structural stability with the flexibility needed to capture complex purchase dynamics. Across multiple real-world datasets and prediction horizons, the proposed model improves upon the latest benchmarks. Businesses benefit directly, as a better assessment of customers' future revenues improves the efficiency of campaign targeting. For research, this work provides guidance on how to embed domain-specific models into the variational autoencoder framework, enabling flexible representation learning while retaining an econometrically meaningful process structure.
翻译:从稀疏且不规则的交易数据中预测客户长期收入,是非契约场景下营销资源配置的核心问题,然而现有方法面临权衡取舍。传统概率客户基础模型通过施加强结构假设来实现稳健的长期预测,而灵活的机器学习模型通常需要大量训练数据和精细调参。我们提出一种基于变分自编码器的模型,该模型保留了条件于客户异质性的经典流失-交易-支出模型中基于过程的可能性,但将限制性参数混合分布替换为由编码器-解码器网络学习得到的灵活隐式表征。所提出的方法:(i) 为客户流失、交易和支出提供统一模型,(ii) 在缺乏上下文协变量时依然可靠,(iii) 存在丰富协变量时能灵活纳入其非线性效应。这种设计在结构稳定性与捕捉复杂购买动态所需的灵活性之间取得平衡。在多个真实数据集和预测时间跨度的实验中,所提模型优于最新基准方法。企业可直接获益——更精准的未来收入评估可提升营销活动定向效率。在学术层面,本研究为将领域特定模型嵌入变分自编码器框架提供了方法论指导,在保留计量经济学意义的过程结构的同时实现灵活表征学习。