We conducted a data collection on the basis of the Google AudioSet database by selecting a subset of the samples annotated with \textit{laughter}. The selection criterion was to be present a communicative act with clear connotation of being either positive (laughing with) or negative (being laughed at). On the basis of this annotated data, we performed two experiments: on the one hand, we manually extract and analyze phonetic features. On the other hand, we conduct several machine learning experiments by systematically combining several automatically extracted acoustic feature sets with machine learning algorithms. This shows that the best performing models can achieve and unweighted average recall of .7.
翻译:我们基于Google AudioSet数据库进行了数据收集,选取了标注为“笑声”的样本子集。选择标准是呈现具有明确正面(同笑)或负面(被嘲笑)含义的交际行为。基于此标注数据,我们进行了两项实验:一方面,手动提取并分析语音特征;另一方面,通过系统组合多种自动提取的声学特征集与机器学习算法,开展了多项机器学习实验。结果表明,最佳模型可实现0.7的未加权平均召回率。