This study explores the quantisation-aware training (QAT) on time series Transformer models. We propose a novel adaptive quantisation scheme that dynamically selects between symmetric and asymmetric schemes during the QAT phase. Our approach demonstrates that matching the quantisation scheme to the real data distribution can reduce computational overhead while maintaining acceptable precision. Moreover, our approach is robust when applied to real-world data and mixed-precision quantisation, where most objects are quantised to 4 bits. Our findings inform model quantisation and deployment decisions while providing a foundation for advancing quantisation techniques.
翻译:本研究探讨了时间序列Transformer模型上的量化感知训练(QAT)。我们提出了一种新颖的自适应量化方案,可在QAT阶段动态选择对称方案与非对称方案。我们的方法表明,将量化方案与真实数据分布相匹配,能够在保持可接受精度的同时降低计算开销。此外,该方法在应用于真实世界数据和混合精度量化(其中大多数对象被量化至4比特)时表现出鲁棒性。我们的研究结果为模型量化与部署决策提供了依据,同时为推进量化技术奠定了基础。