One precondition of effective oral communication is that words should be pronounced clearly, especially for non-native speakers. Word stress is the key to clear and correct English, and misplacement of syllable stress may lead to misunderstandings. Thus, knowing the stress level is important for English speakers and learners. This paper presents a self-attention model to identify the stress level for each syllable of spoken English. Various prosodic and categorical features, including the pitch level, intensity, duration and type of the syllable and its nuclei (the vowel of the syllable), are explored. These features are input to the self-attention model, and syllable-level stresses are predicted. The simplest model yields an accuracy of over 88% and 93% on different datasets, while more advanced models provide higher accuracy. Our study suggests that the self-attention model can be promising in stress-level detection. These models could be applied to various scenarios, such as online meetings and English learning.
翻译:有效口语交流的一个前提是单词发音清晰,尤其对于非母语者而言。词重音是清晰正确英语的关键,音节重音错位可能导致误解。因此,了解重音级别对英语使用者和学习者至关重要。本文提出了一种自注意力模型,用于识别英语口语中每个音节的重音级别。研究探索了多种韵律和分类特征,包括音高等级、音强、时长以及音节及其核心(音节中的元音)的类型。这些特征被输入自注意力模型,并预测音节级重音。最简单的模型在不同数据集上的准确率超过88%和93%,而更先进的模型则提供了更高的准确率。我们的研究表明,自注意力模型在重音级别检测方面具有潜力。这些模型可应用于多种场景,例如在线会议和英语学习。