Voice biometric tasks, such as age estimation require modeling the often complex relationship between voice features and the biometric variable. While deep learning models can handle such complexity, they typically require large amounts of accurately labeled data to perform well. Such data are often scarce for biometric tasks such as voice-based age prediction. On the other hand, simpler models like linear regression can work with smaller datasets but often fail to generalize to the underlying non-linear patterns present in the data. In this paper we propose the Tessellated Linear Model (TLM), a piecewise linear approach that combines the simplicity of linear models with the capacity of non-linear functions. TLM tessellates the feature space into convex regions and fits a linear model within each region. We optimize the tessellation and the linear models using a hierarchical greedy partitioning. We evaluated TLM on the TIMIT dataset on the task of age prediction from voice, where it outperformed state-of-the-art deep learning models.
翻译:语音生物识别任务,如年龄估计,需要建模语音特征与生物识别变量之间通常复杂的关系。虽然深度学习模型能够处理这种复杂性,但它们通常需要大量准确标记的数据才能表现良好。对于基于语音的年龄预测等生物识别任务,此类数据往往稀缺。另一方面,线性回归等较简单的模型可以在较小的数据集上工作,但往往无法泛化到数据中存在的潜在非线性模式。在本文中,我们提出了镶嵌线性模型(TLM),这是一种分段线性方法,结合了线性模型的简单性和非线性函数的能力。TLM将特征空间镶嵌划分为凸区域,并在每个区域内拟合一个线性模型。我们使用分层贪婪分割法来优化镶嵌划分和线性模型。我们在TIMIT数据集上评估了TLM在基于语音的年龄预测任务上的性能,其表现优于最先进的深度学习模型。