Interpolation for scattered data is a classical problem in numerical analysis, with a long history of theoretical and practical contributions. Recent advances have utilized deep neural networks to construct interpolators, exhibiting excellent and generalizable performance. However, they still fall short in two aspects: \textbf{1) inadequate representation learning}, resulting from separate embeddings of observed and target points in popular encoder-decoder frameworks and \textbf{2) limited generalization power}, caused by overlooking prior interpolation knowledge shared across different domains. To overcome these limitations, we present a \textbf{N}umerical \textbf{I}nterpolation approach using \textbf{E}ncoder \textbf{R}epresentation of \textbf{T}ransformers (called \textbf{NIERT}). On one hand, NIERT utilizes an encoder-only framework rather than the encoder-decoder structure. This way, NIERT can embed observed and target points into a unified encoder representation space, thus effectively exploiting the correlations among them and obtaining more precise representations. On the other hand, we propose to pre-train NIERT on large-scale synthetic mathematical functions to acquire prior interpolation knowledge, and transfer it to multiple interpolation domains with consistent performance gain. On both synthetic and real-world datasets, NIERT outperforms the existing approaches by a large margin, i.e., 4.3$\sim$14.3$\times$ lower MAE on TFRD subsets, and 1.7/1.8/8.7$\times$ lower MSE on Mathit/PhysioNet/PTV datasets. The source code of NIERT is available at https://github.com/DingShizhe/NIERT.
翻译:散乱数据插值是数值分析中的经典问题,具有悠久的理论与应用研究历史。近期进展利用深度神经网络构建插值器,展现出优越的泛化性能。然而,现有方法在两个层面仍存在不足:\textbf{1) 表示学习不充分},源于主流编码器-解码器框架中对观测点与目标点的分离嵌入;\textbf{2) 泛化能力有限},由于忽略了跨领域共享的先验插值知识。为克服这些局限,我们提出基于Transformer编码器表示的数值插值方法(简称\textbf{NIERT})。一方面,NIERT采用纯编码器架构而非编码器-解码器结构,从而将观测点与目标点嵌入统一的编码器表示空间,有效利用二者关联性并获得更精确的表示;另一方面,我们提出在大规模合成数学函数上预训练NIERT以获取先验插值知识,并将其迁移至多个插值领域,实现持续性能提升。在合成数据集与真实数据集上,NIERT均大幅超越现有方法:在TFRD子集上MAE降低4.3$\sim$14.3倍,在Mathit/PhysioNet/PTV数据集上MSE分别降低1.7/1.8/8.7倍。NIERT源代码已开源至https://github.com/DingShizhe/NIERT。