A study is presented in which a contrastive learning approach is used to extract low-dimensional representations of the acoustic environment from single-channel, reverberant speech signals. Convolution of room impulse responses (RIRs) with anechoic source signals is leveraged as a data augmentation technique that offers considerable flexibility in the design of the upstream task. We evaluate the embeddings across three different downstream tasks, which include the regression of acoustic parameters reverberation time RT60 and clarity index C50, and the classification into small and large rooms. We demonstrate that the learned representations generalize well to unseen data and perform similarly to a fully-supervised baseline.
翻译:本文提出了一种利用对比学习方法从单通道混响语音信号中提取声学环境低维表示的研究。通过将房间脉冲响应(RIR)与无回声源信号进行卷积,作为数据增强技术,为上游任务的设计提供了显著灵活性。我们在三个不同的下游任务中评估了嵌入表示的效能,包括声学参数混响时间RT60和清晰度指数C50的回归预测,以及小房间与大房间的分类。研究表明,学习到的表示能够很好地泛化到未见数据,且其性能与完全监督的基线方法相当。