We introduce SHARCS for adaptive inference that takes into account the hardness of input samples. SHARCS can train a router on any transformer network, enabling the model to direct different samples to sub-networks with varying widths. Our experiments demonstrate that: (1) SHARCS outperforms or complements existing per-sample adaptive inference methods across various classification tasks in terms of accuracy vs. FLOPs; (2) SHARCS generalizes across different architectures and can be even applied to compressed and efficient transformer encoders to further improve their efficiency; (3) SHARCS can provide a 2 times inference speed up at an insignificant drop in accuracy.
翻译:我们提出了SHARCS方法,用于根据输入样本的难度进行自适应推理。SHARCS可在任意Transformer网络上训练路由模块,使模型能够将不同样本导向具有可变宽度的子网络。实验表明:(1)在多种分类任务的精度与FLOPs权衡方面,SHARCS优于或补充了现有的逐样本自适应推理方法;(2)SHARCS具有跨架构的泛化能力,甚至可应用于经过压缩和优化的高效Transformer编码器以进一步提升其效率;(3)SHARCS可在精度损失极小的情况下实现2倍的推理加速。