We present Eir-8B, a large language model with 8 billion parameters, specifically designed to enhance the accuracy of handling medical tasks in the Thai language. This model focuses on providing clear and easy-to-understand answers for both healthcare professionals and patients, thereby improving the efficiency of diagnosis and treatment processes. Human evaluation was conducted to ensure that the model adheres to care standards and provides unbiased answers. To prioritize data security, the model is deployed within the hospital's internal network, ensuring both high security and faster processing speeds. The internal API connection is secured with encryption and strict authentication measures to prevent data leaks and unauthorized access. We evaluated several open-source large language models with 8 billion parameters on four medical benchmarks: MedQA, MedMCQA, PubMedQA, and the medical subset of MMLU. The best-performing baselines were used to develop Eir-8B. Our evaluation employed multiple questioning strategies, including zero-shot, few-shot, chain-of-thought reasoning, and ensemble/self-consistency voting methods. Our model outperformed commercially available Thai-language large language models by more than 10%. In addition, we developed enhanced model testing tailored for clinical use in Thai across 18 clinical tasks, where our model exceeded GPT-4o performance by more than 11%.
翻译:我们推出Eir-8B,这是一个拥有80亿参数的大语言模型,专门设计用于提升泰语医学任务处理的准确性。该模型致力于为医疗专业人员和患者提供清晰易懂的解答,从而提高诊疗流程的效率。我们通过人工评估确保模型遵循医疗护理标准并提供无偏见的回答。为优先保障数据安全,该模型部署于医院内部网络,既确保了高安全性,又实现了更快的处理速度。内部API连接采用加密技术和严格的身份验证措施,以防止数据泄露和未经授权的访问。我们在四个医学基准测试(MedQA、MedMCQA、PubMedQA以及MMLU的医学子集)上评估了多个80亿参数的开源大语言模型,并基于表现最佳的基线模型开发了Eir-8B。我们的评估采用了多种提问策略,包括零样本学习、少样本学习、思维链推理以及集成/自洽投票方法。实验表明,我们的模型在性能上超越市售泰语大语言模型超过10%。此外,我们针对泰语临床应用的18项临床任务开发了增强型模型测试,在该测试中我们的模型性能超过GPT-4o达11%以上。