We present LIWhiz, a non-intrusive lyric intelligibility prediction system submitted to the ICASSP 2026 Cadenza Challenge. LIWhiz leverages Whisper for robust feature extraction and a trainable back-end for score prediction. Tested on the Cadenza Lyric Intelligibility Prediction (CLIP) evaluation set, LIWhiz achieves a root mean square error (RMSE) of 27.07%, a 22.4% relative RMSE reduction over the STOI-based baseline, yielding a substantial improvement in normalized cross-correlation.
翻译:本文提出LIWhiz,一种提交至ICASSP 2026 Cadenza挑战赛的非侵入式歌词可懂度预测系统。LIWhiz利用Whisper进行鲁棒特征提取,并采用可训练的后端模块进行分数预测。在Cadenza歌词可懂度预测(CLIP)评估集上的测试表明,LIWhiz的均方根误差(RMSE)为27.07%,相较于基于STOI的基线实现了22.4%的相对RMSE降低,在归一化互相关系数上取得了显著提升。