OptDist: Learning Optimal Distribution for Customer Lifetime Value Prediction

Customer Lifetime Value (CLTV) prediction is a critical task in business applications. Accurately predicting CLTV is challenging in real-world business scenarios, as the distribution of CLTV is complex and mutable. Firstly, there is a large number of users without any consumption consisting of a long-tailed part that is too complex to fit. Secondly, the small set of high-value users spent orders of magnitude more than a typical user leading to a wide range of the CLTV distribution which is hard to capture in a single distribution. Existing approaches for CLTV estimation either assume a prior probability distribution and fit a single group of distribution-related parameters for all samples, or directly learn from the posterior distribution with manually predefined buckets in a heuristic manner. However, all these methods fail to handle complex and mutable distributions. In this paper, we propose a novel optimal distribution selection model OptDist for CLTV prediction, which utilizes an adaptive optimal sub-distribution selection mechanism to improve the accuracy of complex distribution modeling. Specifically, OptDist trains several candidate sub-distribution networks in the distribution learning module (DLM) for modeling the probability distribution of CLTV. Then, a distribution selection module (DSM) is proposed to select the sub-distribution for each sample, thus making the selection automatically and adaptively. Besides, we design an alignment mechanism that connects both modules, which effectively guides the optimization. We conduct extensive experiments on both two public and one private dataset to verify that OptDist outperforms state-of-the-art baselines. Furthermore, OptDist has been deployed on a large-scale financial platform for customer acquisition marketing campaigns and the online experiments also demonstrate the effectiveness of OptDist.

翻译：客户终身价值（CLTV）预测是商业应用中的关键任务。在现实商业场景中，由于CLTV的分布复杂且多变，对其进行准确预测具有挑战性。首先，存在大量无消费用户构成的长尾部分，其分布过于复杂难以拟合。其次，少数高价值用户的消费金额比典型用户高出数个数量级，导致CLTV分布范围极广，难以用单一分布刻画。现有的CLTV估计方法要么假设先验概率分布并为所有样本拟合一组分布相关参数，要么以启发式方式通过人工预定义分桶直接从后验分布学习。然而，这些方法均无法有效处理复杂多变的分布。本文提出一种新颖的CLTV预测最优分布选择模型OptDist，该模型采用自适应最优子分布选择机制来提升复杂分布建模的准确性。具体而言，OptDist在分布学习模块（DLM）中训练多个候选子分布网络来建模CLTV的概率分布。随后，通过提出的分布选择模块（DSM）为每个样本选择子分布，从而实现自动化、自适应的分布选择。此外，我们设计了连接两个模块的对齐机制，以有效指导优化过程。我们在两个公开数据集和一个私有数据集上进行了大量实验，验证了OptDist优于现有先进基线方法。进一步地，OptDist已部署于大型金融平台的客户获取营销活动中，在线实验同样证明了其有效性。