The Lottery Ticket Hypothesis (LTH) showed that by iteratively training a model, removing connections with the lowest global weight magnitude and rewinding the remaining connections, sparse networks can be extracted. This global comparison removes context information between connections within a layer. Here we study means for recovering some of this layer distributional context and generalise the LTH to consider weight importance values rather than global weight magnitudes. We find that given a repeatable training procedure, applying different importance metrics leads to distinct performant lottery tickets with little overlapping connections. This strongly suggests that lottery tickets are not unique
翻译:彩票假说(Lottery Ticket Hypothesis, LTH)表明,通过迭代训练模型、移除全局权重幅度最低的连接并重置剩余连接,可以提取稀疏网络。这种全局比较忽略了层内连接之间的上下文信息。本文研究恢复部分层分布上下文的方法,并将LTH推广至考虑权重重要性值而非全局权重幅度。我们发现,在可重复的训练流程中,应用不同的重要性度量会生成性能各异且连接重叠度极低的彩票。这一结果强烈表明彩票并非唯一。