A Dataset for Deep Learning-based Bone Structure Analyses in Total Hip Arthroplasty

Total hip arthroplasty (THA) is a widely used surgical procedure in orthopedics. For THA, it is of clinical significance to analyze the bone structure from the CT images, especially to observe the structure of the acetabulum and femoral head, before the surgical procedure. For such bone structure analyses, deep learning technologies are promising but require high-quality labeled data for the learning, while the data labeling is costly. We address this issue and propose an efficient data annotation pipeline for producing a deep learning-oriented dataset. Our pipeline consists of non-learning-based bone extraction (BE) and acetabulum and femoral head segmentation (AFS) and active-learning-based annotation refinement (AAR). For BE we use the classic graph-cut algorithm. For AFS we propose an improved algorithm, including femoral head boundary localization using first-order and second-order gradient regularization, line-based non-maximum suppression, and anatomy prior-based femoral head extraction. For AAR, we refine the algorithm-produced pseudo labels with the help of trained deep models: we measure the uncertainty based on the disagreement between the original pseudo labels and the deep model predictions, and then find out the samples with the largest uncertainty to ask for manual labeling. Using the proposed pipeline, we construct a large-scale bone structure analyses dataset from more than 300 clinical and diverse CT scans. We perform careful manual labeling for the test set of our data. We then benchmark multiple state-of-the art deep learning-based methods of medical image segmentation using the training and test sets of our data. The extensive experimental results validate the efficacy of the proposed data annotation pipeline. The dataset, related codes and models will be publicly available at https://github.com/hitachinsk/THA.

翻译：全髋关节置换术（THA）是骨科中广泛应用的外科手术。在THA术前，通过CT图像分析骨结构（尤其是观察髋臼和股骨头的结构）具有重要的临床意义。针对此类骨结构分析，深度学习技术前景广阔，但其学习过程需要高质量标注数据，而数据标注成本高昂。为解决该问题，本文提出一种高效的数据标注流程，用于构建面向深度学习的数据集。该流程包含基于非学习的骨提取（BE）、髋臼与股骨头分割（AFS）以及基于主动学习的标注修正（AAR）三部分。骨提取阶段采用经典图割算法；在髋臼与股骨头分割阶段，我们提出改进算法，包括基于一阶和二阶梯度正则化的股骨头边界定位、基于线的非极大值抑制以及基于解剖先验的股骨头提取；在主动学习标注修正阶段，我们借助预训练深度模型修正算法生成的伪标签：通过度量原始伪标签与深度模型预测结果之间的不一致性确定不确定性，并筛选出不确定性最大的样本进行人工标注。利用所提流程，我们从超过300例临床多样化CT扫描中构建了大规模骨结构分析数据集，并对测试集进行了严格的人工标注。基于该数据集的训练集和测试集，我们对多种主流医学图像分割深度学习算法进行了基准测试。大量实验结果验证了所提数据标注流程的有效性。该数据集、相关代码及模型将公开发布于 https://github.com/hitachinsk/THA。