Deep Reinforcement Learning-based Rebalancing Policies for Profit Maximization of Relay Nodes in Payment Channel Networks

from arxiv, Best Paper Award at the 4th International Conference on Mathematical Research for the Blockchain Economy (MARBLE 2023). 28 pages; minor language edits and fixes; acknowledgments added; results unchanged

Payment channel networks (PCNs) are a layer-2 blockchain scalability solution, with its main entity, the payment channel, enabling transactions between pairs of nodes "off-chain," thus reducing the burden on the layer-1 network. Nodes with multiple channels can serve as relays for multihop payments by providing their liquidity and withholding part of the payment amount as a fee. Relay nodes might after a while end up with one or more unbalanced channels, and thus need to trigger a rebalancing operation. In this paper, we study how a relay node can maximize its profits from fees by using the rebalancing method of submarine swaps. We introduce a stochastic model to capture the dynamics of a relay node observing random transaction arrivals and performing occasional rebalancing operations, and express the system evolution as a Markov Decision Process. We formulate the problem of the maximization of the node's fortune over time over all rebalancing policies, and approximate the optimal solution by designing a Deep Reinforcement Learning (DRL)-based rebalancing policy. We build a discrete event simulator of the system and use it to demonstrate the DRL policy's superior performance under most conditions by conducting a comparative study of different policies and parameterizations. Our work is the first to introduce DRL for liquidity management in the complex world of PCNs.

翻译：支付通道网络(PCNs)是第二层区块链扩容解决方案，其核心实体——支付通道，允许节点对在"链下"进行交易，从而减轻第一层网络的负担。拥有多个通道的节点可通过提供流动性并截留部分支付金额作为手续费，充当多跳支付的中继节点。中继节点经过一段时间后可能出现一个或多个通道不平衡的情况，因此需要触发再平衡操作。本文研究如何通过使用潜艇交换的再平衡方法来最大化节点手续费收益。我们引入随机模型来描述中继节点在随机交易到达并执行间歇性再平衡操作时的动态过程，并将系统演化建模为马尔可夫决策过程。我们将所有再平衡策略下节点财富随时间最大化问题形式化，通过设计基于深度强化学习(DRL)的再平衡策略来近似最优解。通过构建系统离散事件仿真器，对不同策略和参数进行对比研究，证明了DRL策略在大多数条件下具有优越性能。本文首次将深度强化学习引入PCNs复杂环境下的流动性管理。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日