Text generation in image-based platforms, particularly for music-related content, requires precise control over text styles and the incorporation of emotional expression. However, existing approaches often need help to control the proportion of external factors in generated text and rely on discrete inputs, lacking continuous control conditions for desired text generation. This study proposes Continuous Parameterization for Controlled Text Generation (CPCTG) to overcome these limitations. Our approach leverages a Language Model (LM) as a style learner, integrating Semantic Cohesion (SC) and Emotional Expression Proportion (EEP) considerations. By enhancing the reward method and manipulating the CPCTG level, our experiments on playlist description and music topic generation tasks demonstrate significant improvements in ROUGE scores, indicating enhanced relevance and coherence in the generated text.
翻译:在基于图像的平台中生成文本,特别是音乐相关内容的文本,需要精确控制文本风格并融入情感表达。然而,现有方法往往难以控制生成文本中外部因素的占比,且依赖于离散输入,缺乏对目标文本生成的连续控制条件。本研究提出一种用于受控文本生成的连续参数化方法(Continuous Parameterization for Controlled Text Generation, CPCTG),以克服这些局限性。我们的方法利用语言模型(LM)作为风格学习器,融合语义连贯性(SC)和情感表达比例(EEP)的考量。通过改进奖励方法并调节CPCTG层级,我们在歌单描述和音乐主题生成任务上的实验表明,ROUGE评分显著提升,表明生成文本的相关性和连贯性得到增强。