Based on audio recordings made once a month during the first 12 months of a child's life, we propose a new method for clustering this set of vocalizations. We use a topologically augmented representation of the vocalizations, employing two persistence diagrams for each vocalization: one computed on the surface of its spectrogram and one on the Takens' embeddings of the vocalization. A synthetic persistent variable is derived for each diagram and added to the MFCCs (Mel-frequency cepstral coefficients). Using this representation, we fit a non-parametric Bayesian mixture model with a Dirichlet process prior to model the number of components. This procedure leads to a novel data-driven categorization of vocal productions. Our findings reveal the presence of 8 clusters of vocalizations, allowing us to compare their temporal distribution and acoustic profiles in the first 12 months of life.
翻译:基于对儿童出生后前12个月每月一次的录音,我们提出了一种新的聚类方法来分析这组发声数据。我们采用拓扑增强的发声表示方法,为每个发声生成两个持续性图:一个计算于其频谱图表面,另一个计算于该发声的Takens嵌入。从每个图中提取一个合成持续性变量,并将其加入MFCC(梅尔频率倒谱系数)特征。利用该表示形式,我们拟合了一个具有Dirichlet过程先验的非参数贝叶斯混合模型,以建模分量数量。这一流程形成了一种新颖的数据驱动型发声分类方法。我们的研究结果揭示了8个发声簇的存在,从而能够比较它们在出生后前12个月的时间分布和声学特征。