Reinforcement learning has been explored for many problems, from video games with deterministic environments to portfolio and operations management in which scenarios are stochastic; however, there have been few attempts to test these methods in banking problems. In this study, we sought to find and automatize an optimal credit card limit adjustment policy by employing reinforcement learning techniques. Because of the historical data available, we considered two possible actions per customer, namely increasing or maintaining an individual's current credit limit. To find this policy, we first formulated this decision-making question as an optimization problem in which the expected profit was maximized; therefore, we balanced two adversarial goals: maximizing the portfolio's revenue and minimizing the portfolio's provisions. Second, given the particularities of our problem, we used an offline learning strategy to simulate the impact of the action based on historical data from a super-app in Latin America to train our reinforcement learning agent. Our results, based on the proposed methodology involving synthetic experimentation, show that a Double Q-learning agent with optimized hyperparameters can outperform other strategies and generate a non-trivial optimal policy not only reflecting the complex nature of this decision but offering an incentive to explore reinforcement learning in real-world banking scenarios. Our research establishes a conceptual structure for applying reinforcement learning framework to credit limit adjustment, presenting an objective technique to make these decisions primarily based on data-driven methods rather than relying only on expert-driven systems. We also study the use of alternative data for the problem of balance prediction, as the latter is a requirement of our proposed model. We find the use of such data does not always bring prediction gains.
翻译:强化学习已被探索应用于诸多问题,从确定性环境的电子游戏到场景随机的投资组合与运营管理;然而,在银行领域测试这些方法的尝试却屈指可数。本研究旨在通过强化学习技术寻找并自动化最优信用卡额度调整策略。基于可用历史数据,我们为每位客户设定两种可行操作:提高或维持其当前信用额度。为发现该策略,我们首先将这一决策问题构建为预期利润最大化的优化问题,从而平衡两大对抗目标:最大化投资组合收益与最小化投资组合准备金。其次,针对问题的特殊性,我们采用离线学习策略,基于拉丁美洲某超级应用的历史数据模拟操作影响,以训练强化学习智能体。基于合成实验的结果表明,经超参数优化的双Q学习智能体能够超越其他策略,生成非平凡的最优策略——这既反映了该决策的复杂本质,也为在真实银行场景中探索强化学习提供了动力。本研究为将强化学习框架应用于信用额度调整建立了概念性架构,提出了一种基于数据驱动方法而非单纯依赖专家系统的客观决策技术。我们还研究了替代数据在余额预测问题中的应用——该预测是本模型的前提条件。研究发现,此类数据并不总能带来预测性能提升。