Functional data analysis is typically performed in two steps: first, functionally representing discrete observations, and then applying functional methods to the so-represented data. The initial choice of a functional representation may have a significant impact on the second phase of the analysis, as shown in recent research, where data-driven spline bases outperformed the predefined rigid choice of functional representation. The method chooses an initial functional basis by an efficient placement of the knots using a simple machine-learning algorithm. The approach does not apply directly when the data are defined on domains of a higher dimension than one such as, for example, images. The reason is that in higher dimensions the convenient and numerically efficient spline bases are obtained as tensor bases from 1D spline bases that require knots that are located on a lattice. This does not allow for a flexible knot placement that was fundamental for the 1D approach. The goal of this research is to propose two modified approaches that circumvent the problem by coding the irregular knot selection into their densities and utilizing these densities through the topology of the spaces of splines. This allows for regular grids for the knots and thus facilitates using the spline tensor bases. It is tested on 1D data showing that its performance is comparable to or better than the previous methods.
翻译:函数数据分析通常分两步进行:首先对离散观测值进行函数表示,然后对已表示的数据应用函数方法。近期研究表明,函数表示的初始选择可能对分析的第二阶段产生显著影响,其中数据驱动的样条基优于预定义的刚性函数表示选择。该方法通过使用简单的机器学习算法高效配置节点来选定初始函数基。但当数据定义在一维以上域(如图像)时,该算法无法直接应用。原因在于高维空间中,便捷且数值高效的样条基需通过一维样条基的张量积获得,这要求节点位于规则格点上,从而无法实现一维方法中关键的灵活节点配置。本研究旨在提出两种改进方法:通过将不规则节点选择编码为密度函数,并利用样条空间拓扑结构应用这些密度,从而规避上述问题。该方法允许节点采用规则网格,进而支持样条张量基的使用。在一维数据上的测试表明,其性能与现有方法相当或更优。