The advanced large language model (LLM) ChatGPT has shown its potential in different domains and remains unbeaten due to its characteristics compared to other LLMs. This study aims to evaluate the potential of using a fine-tuned ChatGPT model as a personal medical assistant in the Arabic language. To do so, this study uses publicly available online questions and answering datasets in Arabic language. There are almost 430K questions and answers for 20 disease-specific categories. GPT-3.5-turbo model was fine-tuned with a portion of this dataset. The performance of this fine-tuned model was evaluated through automated and human evaluation. The automated evaluations include perplexity, coherence, similarity, and token count. Native Arabic speakers with medical knowledge evaluated the generated text by calculating relevance, accuracy, precision, logic, and originality. The overall result shows that ChatGPT has a bright future in medical assistance.
翻译:先进的大语言模型ChatGPT已在多个领域展现出潜力,并因其相较于其他LLM的特性而保持领先地位。本研究旨在评估经微调的ChatGPT模型作为阿拉伯语个人医疗助理的可行性。为此,本研究采用公开的阿拉伯语在线问答数据集,涵盖20个疾病类别的近43万条问答。我们对GPT-3.5-turbo模型进行部分数据微调,并通过自动化评估与人工评估两种方式考察其性能。自动化评估包括困惑度、连贯性、相似度及标记计数;而具有医学背景的阿拉伯语母语者则从相关性、准确性、精确度、逻辑性和原创性五个维度评价生成文本。整体结果表明,ChatGPT在医疗辅助领域具有广阔前景。