LU-Net is a simple and fast architecture for invertible neural networks (INN) that is based on the factorization of quadratic weight matrices $\mathsf{A=LU}$, where $\mathsf{L}$ is a lower triangular matrix with ones on the diagonal and $\mathsf{U}$ an upper triangular matrix. Instead of learning a fully occupied matrix $\mathsf{A}$, we learn $\mathsf{L}$ and $\mathsf{U}$ separately. If combined with an invertible activation function, such layers can easily be inverted whenever the diagonal entries of $\mathsf{U}$ are different from zero. Also, the computation of the determinant of the Jacobian matrix of such layers is cheap. Consequently, the LU architecture allows for cheap computation of the likelihood via the change of variables formula and can be trained according to the maximum likelihood principle. In our numerical experiments, we test the LU-net architecture as generative model on several academic datasets. We also provide a detailed comparison with conventional invertible neural networks in terms of performance, training as well as run time.
翻译:LU-Net是一种基于二次权重矩阵$\mathsf{A=LU}$分解的简单且快速的可逆神经网络架构,其中$\mathsf{L}$为主对角线元素全为1的下三角矩阵,$\mathsf{U}$为上三角矩阵。我们并非学习完整的稠密矩阵$\mathsf{A}$,而是分别学习$\mathsf{L}$和$\mathsf{U}$。当结合可逆激活函数时,只要$\mathsf{U}$的对角元素非零,此类层级即可轻松实现逆运算。同时,计算此类层级雅可比矩阵的行列式成本较低。因此,LU架构可通过变量替换公式实现低成本的似然计算,并依据最大似然原理进行训练。在数值实验中,我们在多个学术数据集上测试了LU-Net架构作为生成模型的性能,并从性能、训练及运行时等方面与常规可逆神经网络进行了详细对比。