In this study, we investigate whether speech symbols, learned through deep learning, follow Zipf's law, akin to natural language symbols. Zipf's law is an empirical law that delineates the frequency distribution of words, forming fundamentals for statistical analysis in natural language processing. Natural language symbols, which are invented by humans to symbolize speech content, are recognized to comply with this law. On the other hand, recent breakthroughs in spoken language processing have given rise to the development of learned speech symbols; these are data-driven symbolizations of speech content. Our objective is to ascertain whether these data-driven speech symbols follow Zipf's law, as the same as natural language symbols. Through our investigation, we aim to forge new ways for the statistical analysis of spoken language processing.
翻译:在本研究中,我们探讨通过深度学习习得的语音符号是否与自然语言符号类似,遵循齐普夫定律。齐普夫定律是一条描述词频分布的经验法则,构成了自然语言处理统计分析的基础。由人类发明以象征语音内容的自然语言符号被公认符合这一定律。另一方面,近期口语处理领域的突破催生了学习式语音符号的发展——这些是对语音内容的数据驱动式符号化表征。我们的目标在于验证这些数据驱动型语音符号是否如同自然语言符号一般遵循齐普夫定律。通过本研究,我们旨在为口语处理的统计分析开辟新途径。