Sorting is a fundamental operation of all computer systems, having been a long-standing significant research topic. Beyond the problem formulation of traditional sorting algorithms, we consider sorting problems for more abstract yet expressive inputs, e.g., multi-digit images and image fragments, through a neural sorting network. To learn a mapping from a high-dimensional input to an ordinal variable, the differentiability of sorting networks needs to be guaranteed. In this paper we define a softening error by a differentiable swap function, and develop an error-free swap function that holds non-decreasing and differentiability conditions. Furthermore, a permutation-equivariant Transformer network with multi-head attention is adopted to capture dependency between given inputs and also leverage its model capacity with self-attention. Experiments on diverse sorting benchmarks show that our methods perform better than or comparable to baseline methods.
翻译:排序是所有计算机系统的基础操作,一直是长期重要的研究课题。不同于传统排序算法的问题形式化,我们通过神经排序网络,考虑对更抽象且表达能力更强的输入(例如:多位数字图像和图像片段)进行排序。为了学习从高维输入到序数变量的映射,排序网络的可微性必须得到保证。本文通过可微交换函数定义了软化误差,并开发了一种兼具非递减性与可微性条件的无误差交换函数。进一步地,采用基于多头注意力的置换等变Transformer网络,以捕获给定输入之间的依赖关系,并借助自注意力机制提升模型容量。在多样化排序基准上的实验表明,我们的方法优于或可媲美基线方法。