Many modern datasets, from areas such as neuroimaging and geostatistics, come in the form of a random sample of tensor-valued data which can be understood as noisy observations of a smooth multidimensional random function. Most of the traditional techniques from functional data analysis are plagued by the curse of dimensionality and quickly become intractable as the dimension of the domain increases. In this paper, we propose a framework for learning continuous representations from a sample of multidimensional functional data that is immune to several manifestations of the curse. These representations are constructed using a set of separable basis functions that are defined to be optimally adapted to the data. We show that the resulting estimation problem can be solved efficiently by the tensor decomposition of a carefully defined reduction transformation of the observed data. Roughness-based regularization is incorporated using a class of differential operator-based penalties. Relevant theoretical properties are also established. The advantages of our method over competing methods are demonstrated in a simulation study. We conclude with a real data application in neuroimaging.
翻译:现代许多数据集(例如神经影像学和地质统计学领域)以张量值数据的随机样本形式呈现,这些数据可视为光滑多维随机函数的带噪观测。传统函数数据分析技术大多受困于维度灾难,随着域维度的增加,其计算复杂度迅速变得难以处理。本文提出一个框架,用于从多维函数数据样本中学习连续表示,该框架能够避免维度灾难的若干表现形式。这些表示通过一组可分离基函数构建,这些基函数被定义为能最优适应数据。我们证明,通过对观测数据精心定义的约简变换进行张量分解,可以高效求解由此产生的估计问题。利用一类基于微分算子的惩罚项,引入了基于粗糙度的正则化方法,并建立了相关理论性质。仿真研究展示了本方法相较于竞争方法的优势,最后通过神经影像学真实数据应用进行验证。