Mixed-precision quantization offers superior performance to fixed-precision quantization. It has been widely used in signal processing, communication systems, and machine learning. In mixed-precision quantization, bit allocation is essential. Hence, in this paper, we propose a new bit allocation framework for mixed-precision quantization from a search perspective. First, we formulate a general bit allocation problem for mixed-precision quantization. Then we introduce the penalized particle swarm optimization (PPSO) algorithm to address the integer consumption constraint. To improve efficiency and avoid iterations on infeasible solutions within the PPSO algorithm, a greedy criterion particle swarm optimization (GC-PSO) algorithm is proposed. The corresponding convergence analysis is derived based on dynamical system theory. Furthermore, we apply the above framework to some specific classic fields, i.e., finite impulse response (FIR) filters, receivers, and gradient descent. Numerical examples in each application underscore the superiority of the proposed framework to the existing algorithms.
翻译:混合精度量化相较于固定精度量化具有更优越的性能,已广泛应用于信号处理、通信系统和机器学习领域。在混合精度量化中,比特分配至关重要。因此,本文从搜索的角度提出了一种新的混合精度量化比特分配框架。首先,我们形式化了混合精度量化的一般比特分配问题。随后,引入惩罚粒子群优化(PPSO)算法来处理整数消耗约束。为提高效率并避免PPSO算法在不可行解上的迭代,我们提出了贪心准则粒子群优化(GC-PSO)算法。基于动力系统理论推导了相应的收敛性分析。此外,我们将上述框架应用于若干具体经典领域,即有限脉冲响应(FIR)滤波器、接收机以及梯度下降。各应用中的数值算例均凸显了所提框架相较于现有算法的优越性。