PODTILE: Facilitating Podcast Episode Browsing with Auto-generated Chapters

Azin Ghazimatin,Ekaterina Garmash,Gustavo Penha,Kristen Sheets,Martin Achenbach,Oguz Semerci,Remi Galvez,Marcus Tannenberg,Sahitya Mantravadi,Divya Narayanan,Ofeliya Kalaydzhyan,Douglas Cole,Ben Carterette,Ann Clifton,Paul N. Bennett,Claudia Hauff,Mounia Lalmas

from arxiv, 9 pages, 4 figures, CIKM industry track 2024

Listeners of long-form talk-audio content, such as podcast episodes, often find it challenging to understand the overall structure and locate relevant sections. A practical solution is to divide episodes into chapters--semantically coherent segments labeled with titles and timestamps. Since most episodes on our platform at Spotify currently lack creator-provided chapters, automating the creation of chapters is essential. Scaling the chapterization of podcast episodes presents unique challenges. First, episodes tend to be less structured than written texts, featuring spontaneous discussions with nuanced transitions. Second, the transcripts are usually lengthy, averaging about 16,000 tokens, which necessitates efficient processing that can preserve context. To address these challenges, we introduce PODTILE, a fine-tuned encoder-decoder transformer to segment conversational data. The model simultaneously generates chapter transitions and titles for the input transcript. To preserve context, each input text is augmented with global context, including the episode's title, description, and previous chapter titles. In our intrinsic evaluation, PODTILE achieved an 11% improvement in ROUGE score over the strongest baseline. Additionally, we provide insights into the practical benefits of auto-generated chapters for listeners navigating episode content. Our findings indicate that auto-generated chapters serve as a useful tool for engaging with less popular podcasts. Finally, we present empirical evidence that using chapter titles can enhance effectiveness of sparse retrieval in search tasks.

翻译：长音频谈话内容（如播客节目）的听众常常难以把握整体结构并定位相关部分。一个实用的解决方案是将节目划分为章节——即具有语义连贯性、并带有标题和时间戳的片段。由于我们Spotify平台上的大多数节目目前缺乏创作者提供的章节，自动化创建章节至关重要。规模化地对播客节目进行章节划分面临独特挑战。首先，节目通常比书面文本的结构性更弱，包含带有微妙转换的自发性讨论。其次，转录文本通常较长，平均约16,000个词元，这需要能够保留上下文的高效处理。为应对这些挑战，我们提出了PODTILE，一种用于分割对话数据的微调编码器-解码器Transformer模型。该模型同时为输入转录文本生成章节转换点和章节标题。为保留上下文，每个输入文本都通过全局上下文进行增强，包括节目标题、描述以及先前的章节标题。在我们的内在评估中，PODTILE在ROUGE分数上比最强基线提高了11%。此外，我们还深入探讨了自动生成章节对听众导航节目内容的实际益处。我们的研究结果表明，自动生成章节是参与较冷门播客节目的有用工具。最后，我们提供了经验证据，表明使用章节标题可以提升搜索任务中稀疏检索的有效性。