Functional data analysis is typically performed in two steps: first, functionally representing discrete observations, and then applying functional methods to the so-represented data. The initial choice of a functional representation may have a significant impact on the second phase of the analysis, as shown in recent research, where data-driven spline bases outperformed the predefined rigid choice of functional representation. The method chooses an initial functional basis by an efficient placement of the knots using a simple machine-learning algorithm. The approach does not apply directly when the data are defined on domains of a higher dimension than one such as, for example, images. The reason is that in higher dimensions the convenient and numerically efficient spline bases are obtained as tensor bases from 1D spline bases that require knots that are located on a lattice. This does not allow for a flexible knot placement that was fundamental for the 1D approach. The goal of this research is to propose two modified approaches that circumvent the problem by coding the irregular knot selection into their densities and utilizing these densities through the topology of the spaces of splines. This allows for regular grids for the knots and thus facilitates using the spline tensor bases. It is tested on 1D data showing that its performance is comparable to or better than the previous methods.
翻译:函数型数据分析通常分为两步:首先对离散观测数据进行函数型表示,然后对已表示数据应用函数型方法。近期研究表明,函数型表示的初始选择对分析的第二阶段有显著影响——数据驱动的样条基方法在性能上优于预设的刚性函数型表示选择。该方法通过简单的机器学习算法优化节点配置来选择初始函数基。但当数据定义在高于一维的域(如图像)时,该方法无法直接应用。原因在于高维空间中,便捷且数值高效的样条基需通过一维样条基的张量基获得,这要求节点位于规则格点上,从而无法实现一维方法中灵活配置节点的核心机制。本研究旨在提出两种改进方法:通过将非规则节点选择编码为密度函数,并利用样条空间拓扑结构对这些密度函数进行操作,从而规避上述问题。该方法既保持了规则格点结构,又支持张量样条基的应用。在一维数据上的测试表明,其性能可与现有方法媲美或更优。