Deep Neural Networks (DNNs) tend to accrue technical debt and suffer from significant retraining costs when adapting to evolving requirements. Modularizing DNNs offers the promise of improving their reusability. Previous work has proposed techniques to decompose DNN models into modules both during and after training. However, these strategies yield several shortcomings, including significant weight overlaps and accuracy losses across modules, restricted focus on convolutional layers only, and added complexity and training time by introducing auxiliary masks to control modularity. In this work, we propose MODA, an activation-driven modular training approach. MODA promotes inherent modularity within a DNN model by directly regulating the activation outputs of its layers based on three modular objectives: intra-class affinity, inter-class dispersion, and compactness. MODA is evaluated using three well-known DNN models and five datasets with varying sizes. This evaluation indicates that, compared to the existing state-of-the-art, using MODA yields several advantages: (1) MODA accomplishes modularization with 22% less training time; (2) the resultant modules generated by MODA comprise up to 24x fewer weights and 37x less weight overlap while (3) preserving the original model's accuracy without additional fine-tuning; in module replacement scenarios, (4) MODA improves the accuracy of a target class by 12% on average while ensuring minimal impact on the accuracy of other classes.
翻译:深度神经网络(DNNs)在适应不断变化的需求时,往往会积累技术债务并产生高昂的重新训练成本。对DNN进行模块化有望提高其可重用性。先前的研究提出了在训练期间或训练后将DNN模型分解为模块的技术。然而,这些策略存在若干缺陷,包括模块间显著的权重重叠与精度损失、仅局限于卷积层,以及因引入辅助掩码来控制模块化而增加了复杂性和训练时间。在本研究中,我们提出了MODA,一种基于激活驱动的模块化训练方法。MODA通过基于三个模块化目标——类内亲和性、类间分散性和紧凑性——直接调控网络各层的激活输出来促进DNN模型内部的固有模块化。我们使用三个知名的DNN模型和五个不同规模的数据集对MODA进行了评估。评估结果表明,与现有最先进方法相比,使用MODA具有以下优势:(1) MODA以减少22%的训练时间完成模块化;(2) MODA生成的模块包含的权重最多减少24倍,权重重叠最多减少37倍,同时(3) 无需额外微调即可保持原始模型的精度;在模块替换场景中,(4) MODA平均将目标类别的精度提高12%,同时确保对其他类别精度的影响最小。