Clinical factors account only for a small portion, about 10-30%, of the controllable factors that affect an individual's health outcomes. The remaining factors include where a person was born and raised, where he/she pursued their education, what their work and family environment is like, etc. These factors are collectively referred to as Social Determinants of Health (SDoH). The majority of SDoH data is recorded in unstructured clinical notes by physicians and practitioners. Recording SDoH data in a structured manner (in an EHR) could greatly benefit from a dedicated ontology of SDoH terms. Our research focuses on extracting sentences from clinical notes, making use of such an SDoH ontology (called SOHO) to provide appropriate concepts. We utilize recent advancements in Deep Learning to optimize the hyperparameters of a Clinical BioBERT model for SDoH text. A genetic algorithm-based hyperparameter tuning regimen was implemented to identify optimal parameter settings. To implement a complete classifier, we pipelined Clinical BioBERT with two subsequent linear layers and two dropout layers. The output predicts whether a text fragment describes an SDoH issue of the patient. We compared the AdamW, Adafactor, and LAMB optimizers. In our experiments, AdamW outperformed the others in terms of accuracy.
翻译:临床因素仅占影响个体健康结果的可控因素的约10-30%。其余因素包括个体的出生与成长环境、教育背景、工作与家庭环境等。这些因素统称为健康社会决定因素(Social Determinants of Health, SDoH)。多数SDoH数据由临床医生以非结构化临床笔记形式记录。若能将SDoH数据以结构化的方式(如电子健康记录)进行记录,将极大受益于专用的SDoH术语本体。本研究聚焦于从临床笔记中提取句子,并利用此类SDoH本体(称为SOHO)提供适当的概念。我们采用深度学习领域的最新进展,针对SDoH文本优化Clinical BioBERT模型的超参数。通过基于遗传算法的超参数调优方案,确定最优参数设置。为构建完整分类器,我们在Clinical BioBERT后串联两个线性层与两个丢弃层,形成流水线处理结构。模型输出用于预测文本片段是否描述了患者的SDoH问题。我们比较了AdamW、Adafactor和LAMB优化器。实验结果表明,AdamW在准确率方面优于其他优化器。