Audio recordings may provide important evidence in criminal investigations. One such case is the forensic association of the recorded audio to the recording location. For example, a voice message may be the only investigative cue to narrow down the candidate sites for a crime. Up to now, several works provide tools for closed-set recording environment classification under relatively clean recording conditions. However, in forensic investigations, the candidate locations are case-specific. Thus, closed-set tools are not applicable without retraining on a sufficient amount of training samples for each case and respective candidate set. In addition, a forensic tool has to deal with audio material from uncontrolled sources with variable properties and quality. In this work, we therefore attempt a major step towards practical forensic application scenarios. We propose a representation learning framework called EnvId, short for environment identification. EnvId avoids case-specific retraining. Instead, it is the first tool for robust few-shot classification of unseen environment locations. We demonstrate that EnvId can handle forensically challenging material. It provides good quality predictions even under unseen signal degradations, environment characteristics or recording position mismatches. Our code and datasets will be made publicly available upon acceptance.
翻译:音频录音可能在刑事调查中提供重要证据,其中一种情况是将录音与录制地点进行法医关联。例如,语音消息可能是缩小犯罪候选地点范围的唯一调查线索。迄今为止,已有若干研究提供了在相对干净录制条件下进行封闭集录制环境分类的工具。然而,在法医调查中,候选地点因案件而异。因此,封闭集工具若未针对每个案件及相应候选集在足够样本上进行重新训练,则无法适用。此外,法医工具必须处理来自非受控来源、具有可变特性和质量的音频材料。为此,本研究朝实用法医应用场景迈出了重要一步。我们提出了一种称为EnvId(环境识别的缩写)的表征学习框架。EnvId避免了针对具体案件的重新训练,而是首个能够对未见环境位置进行鲁棒小样本分类的工具。我们证明EnvId可以处理法医学中具有挑战性的材料,即使面对未见过的信号退化、环境特性或录制位置不匹配,也能提供高质量预测。我们的代码和数据集将在论文被接收后公开。