Reinforcement learning has been explored for many problems, from video games with deterministic environments to portfolio and operations management in which scenarios are stochastic; however, there have been few attempts to test these methods in banking problems. In this study, we sought to find and automatize an optimal credit card limit adjustment policy by employing reinforcement learning techniques. In particular, because of the historical data available, we considered two possible actions per customer, namely increasing or maintaining an individual's current credit limit. To find this policy, we first formulated this decision-making question as an optimization problem in which the expected profit was maximized; therefore, we balanced two adversarial goals: maximizing the portfolio's revenue and minimizing the portfolio's provisions. Second, given the particularities of our problem, we used an offline learning strategy to simulate the impact of the action based on historical data from a super-app (i.e., a mobile application that offers various services from goods deliveries to financial products) in Latin America to train our reinforcement learning agent. Our results show that a Double Q-learning agent with optimized hyperparameters can outperform other strategies and generate a non-trivial optimal policy reflecting the complex nature of this decision. Our research not only establishes a conceptual structure for applying reinforcement learning framework to credit limit adjustment, presenting an objective technique to make these decisions primarily based on data-driven methods rather than relying only on expert-driven systems but also provides insights into the effect of alternative data usage for determining these modifications.
翻译:强化学习已被应用于从确定性环境的视频游戏到随机场景的投资组合与运营管理等诸多问题,然而在银行业务中检验这些方法的尝试仍较为有限。本研究旨在通过强化学习技术寻找并自动化最优信用卡额度调整策略。具体而言,基于可用历史数据,我们为每位客户设定了两种可能操作,即提高或维持当前信用额度。为制定该策略,我们首先将此决策问题形式化为一个优化问题,其目标是最大化预期利润,因此需平衡两个对抗性目标:最大化收入与最小化风险准备金。其次,针对问题特殊性,我们采用离线学习策略,基于拉丁美洲某超级应用(提供从商品配送至金融产品等多种服务的移动应用程序)的历史数据模拟操作影响,以训练强化学习智能体。结果表明,经超参数优化的双Q学习智能体能超越其他策略,生成反映决策复杂性的非平凡最优策略。本研究不仅为将强化学习框架应用于信用额度调整建立了概念性架构,提出以数据驱动方法为主而非仅依赖专家系统的客观决策技术,还为替代性数据在确定这些调整中的作用提供了见解。