Transforming natural language questions into SQL queries is crucial for precise data retrieval from electronic health record (EHR) databases. A significant challenge in this process is detecting and rejecting unanswerable questions that request information beyond the database's scope or exceed the system's capabilities. In this paper, we introduce a novel text-to-SQL framework that robustly handles out-of-domain questions and verifies the generated queries with query execution.Our framework begins by standardizing the structure of questions into a templated format. We use a powerful large language model (LLM), fine-tuned GPT-3.5 with detailed prompts involving the table schemas of the EHR database system. Our experimental results demonstrate the effectiveness of our framework on the EHRSQL-2024 benchmark benchmark, a shared task in the ClinicalNLP workshop. Although a straightforward fine-tuning of GPT shows promising results on the development set, it struggled with the out-of-domain questions in the test set. With our framework, we improve our system's adaptability and achieve competitive performances in the official leaderboard of the EHRSQL-2024 challenge.
翻译:将自然语言问题转化为SQL查询对于从电子健康记录(EHR)数据库中精确检索数据至关重要。这一过程中的一项重大挑战是检测并拒绝那些超出数据库范围或系统能力所及的无解问题。本文提出了一种新颖的文本到SQL框架,该框架能够稳健地处理域外问题,并通过查询执行验证生成的查询。我们的框架首先将问题的结构标准化为模板化格式。我们使用一个强大的大语言模型(LLM),即经过微调的GPT-3.5,并配合涉及EHR数据库系统表结构的详细提示。我们的实验结果证明了该框架在EHRSQL-2024基准测试(ClinicalNLP研讨会的一项共享任务)上的有效性。尽管对GPT进行简单微调在开发集上显示出有希望的结果,但它在测试集的域外问题上表现不佳。通过我们的框架,我们提升了系统的适应性,并在EHRSQL-2024挑战赛的官方排行榜上取得了有竞争力的性能。