Soil-transmitted helminth (STH) infections continuously affect a large proportion of the global population, particularly in tropical and sub-tropical regions, where access to specialized diagnostic expertise is limited. Although manual microscopic diagnosis of parasitic eggs remains the diagnostic gold standard, the approach can be labour-intensive, time-consuming, and prone to human error. This paper aims to utilize a vision language model (VLM) such as Microsoft Florence that was fine-tuned to localize all parasitic eggs within microscopic images. The preliminary results show that our localization VLM performs comparatively better than the other object detection methods, such as EfficientDet, with an mIOU of 0.94. This finding demonstrates the potential of the proposed VLM to serve as a core component of an automated framework, offering a scalable engineering solution for intelligent parasitological diagnosis.
翻译:土壤传播性蠕虫(STH)感染持续影响着全球很大一部分人口,特别是在热带和亚热带地区,这些地区获取专业诊断知识的途径有限。尽管寄生虫卵的手动显微诊断仍是诊断的金标准,但该方法可能劳动强度大、耗时且容易产生人为误差。本文旨在利用一个经过微调的视觉语言模型(VLM),例如Microsoft Florence,来定位显微图像中的所有寄生虫卵。初步结果表明,我们的定位VLM相较于其他目标检测方法(如EfficientDet)表现更优,其mIOU达到0.94。这一发现证明了所提出的VLM有潜力作为自动化框架的核心组件,为智能寄生虫学诊断提供一个可扩展的工程解决方案。