Segmenting speech transcripts into thematic sections benefits both downstream processing and users who depend on written text for accessibility. We introduce a novel approach to hierarchical topic segmentation in transcripts, generating multi-level tables of contents that capture both topic and subtopic boundaries. We compare zero-shot prompting and LoRA fine-tuning on large language models, while also exploring the integration of high-level speech pause features. Evaluations on English meeting recordings and multilingual lecture transcripts (Portuguese, German) show significant improvements over established topic segmentation baselines. Additionally, we adapt a common evaluation measure for multi-level segmentation, taking into account all hierarchical levels within one metric.
翻译:将语音转录文本按主题分割既有利于下游处理,也能提升依赖书面文本获取信息的可访问性用户体验。本文提出一种新颖的转录文本层级主题分割方法,通过生成多级目录同时捕获主题与子主题边界。我们比较了大型语言模型的零样本提示与LoRA微调策略,并探索了高层级语音停顿特征的融合机制。在英文会议录音及多语言讲座转录文本(葡萄牙语、德语)上的评估表明,该方法相较现有主题分割基线模型有显著提升。此外,我们针对多层级分割任务改进了一种通用评估指标,使其能通过单一度量综合考量所有层级结构。