Since the release of ChatGPT, generative models have achieved tremendous success and become the de facto approach for various NLP tasks. However, its application in the field of input methods remains under-explored. Many neural network approaches have been applied to the construction of Chinese input method engines(IMEs).Previous research often assumed that the input pinyin was correct and focused on Pinyin-to-character(P2C) task, which significantly falls short of meeting users' demands. Moreover, previous research could not leverage user feedback to optimize the model and provide personalized results. In this study, we propose a novel Generative Input paradigm named GeneInput. It uses prompts to handle all input scenarios and other intelligent auxiliary input functions, optimizing the model with user feedback to deliver personalized results. The results demonstrate that we have achieved state-of-the-art performance for the first time in the Full-mode Key-sequence to Characters(FK2C) task. We propose a novel reward model training method that eliminates the need for additional manual annotations and the performance surpasses GPT-4 in tasks involving intelligent association and conversational assistance. Compared to traditional paradigms, GeneInput not only demonstrates superior performance but also exhibits enhanced robustness, scalability, and online learning capabilities.
翻译:自ChatGPT发布以来,生成式模型取得了巨大成功,并成为各类自然语言处理任务的事实标准方法。然而,其在输入法领域的应用仍处于探索不足的状态。众多神经网络方法已被应用于中文输入法引擎的构建中。以往研究通常假设输入拼音是正确的,并聚焦于拼音到汉字(P2C)任务,这远不能满足用户的实际需求。此外,以往研究无法利用用户反馈来优化模型并提供个性化结果。本研究提出了一种名为GeneInput的新型生成式输入范式。它利用提示词处理所有输入场景及其他智能辅助输入功能,并通过用户反馈优化模型以提供个性化结果。实验结果表明,我们在全模式按键序列到汉字(FK2C)任务上首次实现了最先进的性能。我们提出了一种新颖的奖励模型训练方法,无需额外的人工标注,且在智能联想与对话辅助任务上性能超越了GPT-4。与传统范式相比,GeneInput不仅展现了更优的性能,还具备了更强的鲁棒性、可扩展性和在线学习能力。