Neural Architecture Search (NAS) has shown promising capability in learning text representation. However, existing text-based NAS neither performs a learnable fusion of neural operations to optimize the architecture, nor encodes the latent hierarchical categorization behind text input. This paper presents a novel NAS method, Discretized Differentiable Neural Architecture Search (DDNAS), for text representation learning and classification. With the continuous relaxation of architecture representation, DDNAS can use gradient descent to optimize the search. We also propose a novel discretization layer via mutual information maximization, which is imposed on every search node to model the latent hierarchical categorization in text representation. Extensive experiments conducted on eight diverse real datasets exhibit that DDNAS can consistently outperform the state-of-the-art NAS methods. While DDNAS relies on only three basic operations, i.e., convolution, pooling, and none, to be the candidates of NAS building blocks, its promising performance is noticeable and extensible to obtain further improvement by adding more different operations.
翻译:神经架构搜索在文本表示学习方面展现出令人瞩目的潜力。然而,现有基于文本的NAS既无法实现对神经操作的可学习融合以优化架构,也无法编码文本输入背后的潜在层次化分类信息。本文提出了一种新颖的NAS方法——离散化可微神经架构搜索,用于文本表示学习与分类。通过架构表示的连续松弛化,DDNAS能够利用梯度下降法优化搜索过程。我们还提出了一种基于互信息最大化的新型离散化层,该层被施加于每个搜索节点上,以建模文本表示中潜在的层次化分类结构。在八个不同的真实数据集上进行的广泛实验表明,DDNAS能够持续优于现有最先进的NAS方法。尽管DDNAS仅依赖卷积、池化和空操作这三种基本操作作为NAS构建块的候选,但其卓越的性能已十分显著,并且可通过增加更多不同操作进行扩展以获得进一步提升。