文献速递:深度学习乳腺癌诊断---鲁棒的乳腺癌检测在乳房X光摄影术 和数字化乳腺层析扫描中使用一个 标注高效的深度学习方法
文献速递:基于深度学习的乳腺癌诊断研究---具有抗干扰能力强的鲁棒性检测方法,在乳房X光摄影术和数字化乳腺层析扫描中采用一种高精度标注的方法
麦田医学 美好事物中转站 2024-03-19 10:27
Title
题目
Accurate breast cancer identification in both standard mammography and three-dimensional digital breast tomosynthesis can be achieved using an annotation-optimized deep learning framework.
具有强健性的乳腺癌检测方法在乳房X光摄影术和数字化乳腺层析扫描中得到广泛应用,并采用一种标注效率高的深度学习算法来辅助诊断
01
文献速递介绍
乳腺癌仍是全球面临的一个重大挑战,每年约导致60万生命终结。为了更早发现癌症,全球健康组织建议进行筛查性乳腺X光摄影检查,研究显示这种方法可能将乳腺癌死亡率降低20-40%。尽管这种方法的价值已被广泛认可,但其高误诊率和不一致的专家阅片效果限制了其应用潜力。近年来,深度学习在乳腺X光摄影领域的应用日益增多,这表明现有技术仍面临两个关键问题:获得高质量标注训练数据以及实现跨人群、多设备和多模态的泛化能力。为此,我们提出了一种标注效率高的深度学习方案,该方案在乳腺X光摄影分类任务中达到当前最佳性能水平,并成功扩展至数字乳腺体层摄影技术(DBT;3D X射线成像)中。该系统能够在临床阴性前哨乳腺X光摄影中发现癌症病变,并在低筛查率人群中展现出良好的泛化能力,同时相较于五位全职乳腺影像专家中的五位同行专家表现出更好的性能表现。通过从DBT数据生成"最大怀疑投影"(MSP)图像并采用渐进式多实例学习方法,我们仅利用基础乳房层面标签就能高效训练DBT检查系统,同时维持了定位可解释性的优势特性。综上所述,我们的研究结果表明:一种高效的软件系统有望显著提升全球筛查性乳腺X光摄影的准确性和可及性。
Fig
图

_Fig. 1 | Model training approach and data summary. a, We made full use of both strongly and weakly labeled data to develop an efficient system while actively mitigating overfitting challenges. This was accomplished through a progressive training strategy that included three distinct stages. In Stage 1, we conducted patch-level classification by extracting patches from randomly sampled regions across multiple reconstructed slices in raw image data. The resulting feature representations were then utilized as initializations for Stage 2 detection network training, which was implemented using a strongly supervised end-to-end learning framework. During this stage, we optimized the detection network by incorporating spatial attention mechanisms to improve its ability to identify abnormalities. Finally, Stage 3 incorporated weakly supervised learning techniques for both two-dimensional (2D) and three-dimensional (3D) mammography analyses. For the purposes of this study, all mammography datasets were augmented with additional manual annotations to enhance their utility in weakly supervised learning scenarios. b, Detailed description of the training and testing datasets employed in this study. c, Overview of the diagnostic criteria utilized in this research project.]
图 1 | 图形化模型训练方法及数据概述。a) 为了解决如何有效利用强标签与弱标签数据同时减轻过拟合的问题, 我们系统性地分阶段完成模型训练工作。第一阶段采用从2D乳房X光摄影术中裁剪图像块的技术构建片段级分类模型, 这一过程能够帮助我们快速获得初步特征提取能力。第二阶段则基于第一阶段获得的特征主干, 使用检测网络进行全连接训练过程, 并结合强监督方式提升模型性能。第三阶段则引入弱监督机制, 适用于不同维度的乳房X光摄影术数据集: 在2D情况下, 应用多实例学习框架下的二分类任务; 在3D情况下, 利用DBT切片进行感兴趣区域筛选后压缩投影空间再进行优化设计, 最终将这些改进应用到MSP图像上完成进一步训练

_Fig. 2 | Reader study results. a, Index cancer exams and confirmed negatives. i, The proposed deep learning model achieved superior performance on a dataset comprising 131 index cancer exams and 154 true negatives, evaluated by five radiologists. The data point represents each reader’s performance, and the ROC curve illustrates the deep learning model’s diagnostic capabilities. The cross indicates the average performance across all radiologists, with error bars representing the 95% confidence intervals.]
该研究评估了所提模型在每种特定敏感度下的敏感度,并将其与每位读者的敏感度进行了对比分析;同时评估了模型在每种特定特异度下的特异性,并与其对应的每位读者的特异性进行了比较研究;为优化检测流程,在预测试阶段选取了预先诊断癌前检查;为了全面评估检测能力,在确认阴性病例中引入了交叉验证机制;该研究采用基于自助抽样法计算了模型与读者间差异的标准差。
图 2 | 阅读者研究结果。a,在阳性样本中被正确识别为阴性的案例。i,所提出的深度学习模型在131个阳性样本中的阴性检测任务上显著优于经验丰富的5位放射科专家(p < 0.05)。每个测试实例均独立评估了系统性能;而通过ROC曲线评估了模型检测性能;横坐标表示5位专家检测性能的平均值及其95%置信区间。
ii, 每个观察者的灵敏度及其在匹配特定灵敏度时所采用的特异性水平。
iii, 每个观察者的特异度及其在匹配特定特异度时所采用的灵敏度水平。
b, 作为阴性结果的标准。
i, 提出的深度学习模型在早期检测任务中表现优于五位放射科专家。
数据集由120个作为阴性结果的标准组成——这些检查被定义为在发现癌症后12-24个月进行乳房X光摄影解读的结果——以及154个确认阴性的样本。其中十字架代表了放射科专家平均表现水平,并以95%置信区间表示其长度。
ii, 每个观察者的灵敏度及其在匹配特定灵敏度时所采用的特异性水平。
iii, 每个观察者的特异度及其在匹配特定特异度时所采用的灵敏度水平。
对于灵敏度与特异性的表格分析,在计算模型减去个体差异的标准差时采用了自举法

Fig. 3 | Examples of index and pre-index cancer exam pairs. Images from three patients with biopsy-proven malignancies are displayed. For each patient, an image from the index exam from which the cancer was discovered is shown on the right, and an image from the prior screening exam acquired 12–24 months earlier and interpreted as negative is shown on the left. From top to bottom, the number of days between the index and pre-index exams is 378, 629, and 414. The dots below each image indicate reader and model performance. Specifically, the number of infilled black dots represent how many of the five readers correctly classified the corresponding case, and the number of infilled red dots represent how many times the model would correctly classify the case if the model score threshold was individually set to match the specificity of each reader. The model is thus evaluated at five binary decision thresholds for comparison purposes, and we note that a different binary score threshold may be used in practice. Red boxes on the images indicate the model’s bounding box output. White arrows indicate the location of the malignant lesion. a, A cancer that was correctly classified by all readers and the deep learning model at all thresholds in the index case, but detected by only the model in the pre-index case. b, A cancer that was detected by the model in both the pre-index and index cases, but detected by only one reader in the index case and zero readers in the pre-index case. c, A cancer that was detected by the readers and the model in the index case, but detected by only one reader in the pre-index case. The absence of a red bounding box indicates that the model did not detect the cancer.
图3展示了通过对比索引检查与前一次筛查结果进行分析的病例实例。图中显示的是三位经活组织检查证实为恶性的患者的影像资料。对于每位患者而言,在右侧部分展示了发现癌症时所使用的索引检查影像,在左侧则呈现了12至24个月内进行过的一次阴性筛查结果影像。从上至下依次标明三种病例中索引检查与前一次筛查之间的天数分别为378天、629天和414天。在每张影像下方标注了不同评价指标的表现情况:黑色标记代表五位观察者中至少有一位成功识别出病变;红色标记则表明模型识别出病变的具体区域位置;白色箭头则指出了恶性病变在影像中的具体位置。研究重点放在模型在五个二元分类阈值下的表现评估上,并提出了几个关键观察结果:第一,在索引病例中所有观察者及深度学习系统均能准确识别所有病变样本;但在前一次筛查结果中仅模型发现了部分病变样本;第二,在两种情况下模型均识别到了所有病变样本;但仅在索引病例中有一位观察者能够检测到这些病变;第三,在索引病例中所有观察者及模型均能识别到特定病变样本;而在前一次筛查结果中标识出这些病变时仅有其中一位观察者能够完成任务;第四,在两种情况下均未能检测到某些特定区域内的癌细胞分布情况
Table
表

Table 1 | Summary of additional DM and DBT evaluation
表 1 | 附加DM和DBT评估摘要
