Deep reinforcement learning (DRL) has shown remarkable success in complex autonomous driving scenarios. However, DRL models inevitably bring high memory consumption and computation, which hinders their wide deployment in resource-limited autonomous driving devices. Structured Pruning has been recognized as a useful method to compress and accelerate DRL models, but it is still challenging to estimate the contribution of a parameter (i.e., neuron) to DRL models. In this paper, we introduce a novel dynamic structured pruning approach that gradually removes a DRL model's unimportant neurons during the training stage. Our method consists of two steps, i.e. training DRL models with a group sparse regularizer and removing unimportant neurons with a dynamic pruning threshold. To efficiently train the DRL model with a small number of important neurons, we employ a neuron-importance group sparse regularizer. In contrast to conventional regularizers, this regularizer imposes a penalty on redundant groups of neurons that do not significantly influence the output of the DRL model. Furthermore, we design a novel structured pruning strategy to dynamically determine the pruning threshold and gradually remove unimportant neurons with a binary mask. Therefore, our method can remove not only redundant groups of neurons of the DRL model but also achieve high and robust performance. Experimental results show that the proposed method is competitive with existing DRL pruning methods on discrete control environments (i.e., CartPole-v1 and LunarLander-v2) and MuJoCo continuous environments (i.e., Hopper-v3 and Walker2D-v3). Specifically, our method effectively compresses $93\%$ neurons and $96\%$ weights of the DRL model in four challenging DRL environments with slight accuracy degradation.
翻译:深度强化学习(DRL)在复杂的自动驾驶场景中展现出显著的成功。然而,DRL模型不可避免地带来高内存消耗和计算负担,这阻碍了其在资源受限的自动驾驶设备中的广泛部署。结构化剪枝已被认为是压缩和加速DRL模型的有效方法,但评估参数(即神经元)对DRL模型的贡献仍然具有挑战性。本文提出了一种新颖的动态结构化剪枝方法,在训练阶段逐步移除DRL模型中不重要的神经元。我们的方法包含两个步骤:使用组稀疏正则化器训练DRL模型,以及利用动态剪枝阈值移除不重要的神经元。为高效训练仅含少量重要神经元的DRL模型,我们采用了一种神经元重要性的组稀疏正则化器。与传统正则化器不同,该正则化器对不影响DRL模型输出的冗余神经元组施加惩罚。此外,我们设计了一种新颖的结构化剪枝策略,动态确定剪枝阈值,并通过二进制掩码逐步移除不重要的神经元。因此,我们的方法不仅能移除DRL模型中的冗余神经元组,还能实现高且鲁棒的性能。实验结果表明,该方法在离散控制环境(即CartPole-v1和LunarLander-v2)以及MuJoCo连续环境(即Hopper-v3和Walker2D-v3)中与现有DRL剪枝方法相比具有竞争力。具体而言,我们的方法在四个具有挑战性的DRL环境中有效压缩了DRL模型93%的神经元和96%的权重,且精度略有下降。