Diffusion-based algorithms have emerged as promising techniques for weight generation, particularly in scenarios like multi-task learning that require frequent weight updates. However, existing solutions suffer from limited cross-task transferability. In addition, they only utilize optimal weights as training samples, ignoring the value of other weights in the optimization process. To address these issues, we propose Lt-Di, which integrates the diffusion algorithm with meta-learning to generate weights for unseen tasks. Furthermore, we extend the vanilla diffusion algorithm into a trajectory diffusion algorithm to utilize other weights along the optimization trajectory. Trajectory diffusion decomposes the entire diffusion chain into multiple shorter ones, improving training and inference efficiency. We analyze the convergence properties of the weight generation paradigm and improve convergence efficiency without additional time overhead. Our experiments demonstrate Lt-Di's higher accuracy while reducing computational overhead across various tasks, including zero-shot and few-shot learning, multi-domain generalization, and large-scale language model fine-tuning.Our code is released at https://anonymous.4open.science/r/Lt-Di-0E51.
翻译:基于扩散的算法已成为权重生成的有前景技术,尤其在需要频繁权重更新的多任务学习等场景中。然而,现有解决方案存在跨任务可迁移性有限的问题。此外,它们仅利用最优权重作为训练样本,忽略了优化过程中其他权重的价值。为解决这些问题,我们提出Lt-Di,将扩散算法与元学习相结合,为未见任务生成权重。进一步地,我们将基础扩散算法扩展为轨迹扩散算法,以利用优化轨迹上的其他权重。轨迹扩散将整个扩散链分解为多个较短的链,提高了训练和推理效率。我们分析了权重生成范式的收敛特性,并在不增加额外时间开销的情况下提升了收敛效率。实验表明,Lt-Di在多种任务(包括零样本与少样本学习、多领域泛化以及大规模语言模型微调)中实现了更高的准确率,同时降低了计算开销。代码发布于https://anonymous.4open.science/r/Lt-Di-0E51。