Functional linear discriminant analysis (FLDA) is a powerful tool that extends LDA-mediated multiclass classification and dimension reduction to univariate time-series functions. However, in the age of large multivariate and incomplete data, statistical dependencies between features must be estimated in a computationally tractable way, while also dealing with missing data. There is a need for a computationally tractable approach that considers the statistical dependencies between features and can handle missing values. We here develop a multivariate version of FLDA (MUDRA) to tackle this issue and describe an efficient expectation/conditional-maximization (ECM) algorithm to infer its parameters. We assess its predictive power on the "Articulary Word Recognition" data set and show its improvement over the state-of-the-art, especially in the case of missing data. MUDRA allows interpretable classification of data sets with large proportions of missing data, which will be particularly useful for medical or psychological data sets.
翻译:函数线性判别分析(FLDA)是一种强大的工具,它将线性判别分析(LDA)介导的多类分类与降维方法扩展至单变量时间序列函数。然而,在大规模多变量且不完整数据的时代,特征间的统计依赖性需以计算可行的方式进行估计,同时还需处理缺失数据。因此,亟需一种既能考虑特征间统计依赖性又能处理缺失值的计算可行方法。本文开发了FLDA的多元版本(MUDRA)以解决该问题,并描述了一种高效的期望/条件最大化(ECM)算法来推断其参数。我们基于“Articulary Word Recognition”数据集评估其预测能力,结果表明该方法优于现有技术,尤其在存在缺失数据时表现更佳。MUDRA可对具有高比例缺失数据的数据集进行可解释的分类,这对医学或心理学数据集将尤为实用。