Knowledge-integrated AutoEncoder Model

Data encoding is a common and central operation in most data analysis tasks. The performance of other models, downstream in the computational process, highly depends on the quality of data encoding. One of the most powerful ways to encode data is using the neural network AutoEncoder (AE) architecture. However, the developers of AE are not able to easily influence the produced embedding space, as it is usually treated as a \textit{black box} technique, which makes it uncontrollable and not necessarily has desired properties for downstream tasks. In this paper, we introduce a novel approach for developing AE models that can integrate external knowledge sources into the learning process, possibly leading to more accurate results. The proposed \methodNamefull{} (\methodName{}) model is able to leverage domain-specific information to make sure the desired distance and neighborhood properties between samples are preservative in the embedding space. The proposed model is evaluated on three large-scale datasets from three different scientific fields and is compared to nine existing encoding models. The results demonstrate that the \methodName{} model effectively captures the underlying structures and relationships between the input data and external knowledge, meaning it generates a more useful representation. This leads to outperforming the rest of the models in terms of reconstruction accuracy.

翻译：数据编码是大多数数据分析任务中常见且核心的操作。计算流程中下游模型的性能高度依赖于数据编码的质量。最强大的数据编码方式之一是利用神经网络自编码器（AutoEncoder, AE）架构。然而，自编码器开发者难以轻易影响生成的嵌入空间，因为该方法通常被视为一种“黑箱”技术，导致其不可控且未必具有下游任务所需的理想特性。本文提出一种开发自编码器模型的新方法，能够将外部知识源集成到学习过程中，从而可能获得更精确的结果。所提出的\methodNamefull{}（\methodName{}）模型能够利用领域特定信息，确保嵌入空间中样本间所需的距离与邻域属性得以保持。该模型在来自三个不同科学领域的大规模数据集上进行了评估，并与九种现有编码模型进行了比较。结果表明，\methodName{}模型有效捕捉了输入数据与外部知识之间的潜在结构与关系，即生成了更具实用性的表示，从而在重建精度方面优于其余模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日