We introduce a methodology for the identification of notifiable events in the domain of healthcare. The methodology harnesses semantic frames to define fine-grained patterns and search them in unstructured data, namely, open-text fields in e-medical records. We apply the methodology to the problem of underreporting of gender-based violence (GBV) in e-medical records produced during patients' visits to primary care units. A total of eight patterns are defined and searched on a corpus of 21 million sentences in Brazilian Portuguese extracted from e-SUS APS. The results are manually evaluated by linguists and the precision of each pattern measured. Our findings reveal that the methodology effectively identifies reports of violence with a precision of 0.726, confirming its robustness. Designed as a transparent, efficient, low-carbon, and language-agnostic pipeline, the approach can be easily adapted to other health surveillance contexts, contributing to the broader, ethical, and explainable use of NLP in public health systems.
翻译:我们提出了一种识别医疗领域可报告事件的方法论。该方法利用语义框架定义细粒度模式,并在非结构化数据(即电子病历中的开放文本字段)中进行搜索。我们将该方法应用于识别基层医疗单位患者就诊电子病历中性别暴力漏报问题。共定义了八种模式,并在提取自e-SUS APS系统的2100万条巴西葡萄牙语句子构成的语料库中进行搜索。结果由语言学家进行人工评估,并测量了每种模式的精确度。研究结果表明,该方法有效识别暴力事件报告,精确度达0.726,验证了其稳健性。该技术方案设计为透明、高效、低碳且语言无关的流水线,可便捷适配其他健康监测场景,为公共卫生系统中自然语言处理的广泛、合伦理且可解释应用提供助力。