Compressed Neural Networks have the potential to enable deep learning across new applications and smaller computational environments. However, understanding the range of learning tasks in which such models can succeed is not well studied. In this work, we apply sparse and binary-weighted Transformers to multivariate time series problems, showing that the lightweight models achieve accuracy comparable to that of dense floating-point Transformers of the same structure. Our model achieves favorable results across three time series learning tasks: classification, anomaly detection, and single-step forecasting. Additionally, to reduce the computational complexity of the attention mechanism, we apply two modifications, which show little to no decline in model performance: 1) in the classification task, we apply a fixed mask to the query, key, and value activations, and 2) for forecasting and anomaly detection, which rely on predicting outputs at a single point in time, we propose an attention mask to allow computation only at the current time step. Together, each compression technique and attention modification substantially reduces the number of non-zero operations necessary in the Transformer. We measure the computational savings of our approach over a range of metrics including parameter count, bit size, and floating point operation (FLOPs) count, showing up to a 53x reduction in storage size and up to 10.5x reduction in FLOPs.
翻译:压缩神经网络有望在新型应用和更小计算环境中实现深度学习。然而,这类模型能够成功应对的学习任务范围尚未得到充分研究。本研究将稀疏化和二值权重Transformer应用于多元时间序列问题,实验表明轻量级模型在准确度上可与相同结构的密集浮点Transformer相媲美。我们的模型在三个时间序列学习任务(分类、异常检测和单步预测)中均取得了良好效果。此外,为降低注意力机制的计算复杂度,我们采用两项改进措施,且模型性能几乎未受影响:1)在分类任务中,对查询、键和值激活使用固定掩码;2)针对依赖单点时间输出预测的预测与异常检测任务,我们提出仅允许在当前时间步计算的注意力掩码。通过综合运用压缩技术与注意力改进,Transformer中所需的非零运算量显著减少。我们通过参数数量、比特大小和浮点运算次数等多维度指标衡量方法的计算效益,结果显示存储空间最多可缩减53倍,浮点运算量最高减少10.5倍。