This study aims to establish a computer-aided diagnostic system for lung lesions using endobronchial ultrasound (EBUS) to assist physicians in identifying lesion areas. During EBUS-transbronchial needle aspiration (EBUS-TBNA) procedures, hysicians rely on grayscale ultrasound images to determine the location of lesions. However, these images often contain significant noise and can be influenced by surrounding tissues or blood vessels, making identification challenging. Previous research has lacked the application of object detection models to EBUS-TBNA, and there has been no well-defined solution for the lack of annotated data in the EBUS-TBNA dataset. In related studies on ultrasound images, although models have been successful in capturing target regions for their respective tasks, their training and predictions have been based on two-dimensional images, limiting their ability to leverage temporal features for improved predictions. This study introduces a three-dimensional video-based object detection model. It first generates a set of improved queries using a diffusion model, then captures temporal correlations through an attention mechanism. A filtering mechanism selects relevant information from previous frames to pass to the current frame. Subsequently, a teacher-student model training approach is employed to further optimize the model using unlabeled data. By incorporating various data augmentation and feature alignment, the model gains robustness against interference. Test results demonstrate that this model, which captures spatiotemporal information and employs semi-supervised learning methods, achieves an Average Precision (AP) of 48.7 on the test dataset, outperforming other models. It also achieves an Average Recall (AR) of 79.2, significantly leading over existing models.
翻译:本研究旨在建立基于支气管内超声(EBUS)的肺部病灶计算机辅助诊断系统,以协助医师识别病灶区域。在EBUS-经支气管针吸活检(EBUS-TBNA)操作过程中,医师依赖灰度超声图像判定病灶位置。然而,这些图像常含有显著噪声,并可能受周围组织或血管干扰,导致识别困难。既往研究尚未将目标检测模型应用于EBUS-TBNA领域,且针对EBUS-TBNA数据集中标注数据匮乏的问题尚未形成完善的解决方案。在超声图像的相关研究中,尽管现有模型能有效捕捉特定任务的目标区域,但其训练与预测均基于二维图像,限制了利用时序特征提升预测性能的能力。本研究提出一种基于三维视频的目标检测模型:首先通过扩散模型生成一组优化查询向量,继而通过注意力机制捕捉时序相关性;过滤机制从前序帧中筛选相关信息传递至当前帧;随后采用师生模型训练策略,利用未标注数据进一步优化模型。通过结合多种数据增强与特征对齐技术,该模型获得了抗干扰的鲁棒性。测试结果表明,这种融合时空信息并采用半监督学习方法的模型在测试数据集上实现了48.7的平均精度(AP),性能优于其他模型;同时达到79.2的平均召回率(AR),显著领先于现有模型。