文献速递:深度学习肝脏肿瘤诊断---基于深度学习的肝细胞结节性病变在整片组织病理图像上的分类
Title
题目
Deep Learning Model for the Identification and Categorization of Hepatocellular Nodular Lesions on Whole-Slide Histopathologic Images
基于深度学习的肝细胞结节性病变在整片组织病理图像上的分类
Background
背景
Hepatocellular nodular lesions (HNLs) represent a diverse group of disorders. The diagnostic challenge among these lesions, particularly high-grade dysplastic nodules (HGDNs) and well-differentiated hepatocellular carcinoma (WD-HCC), can be quite complex, even biopsy specimens. We endeavored to create a deep learning system to address these diagnostic challenges, enhancing the histopathologic assessment of HNLs (including WD-HCC, HGDN, low-grade dysplasia, focal nodular hyperplasia, and hepatocellular adenoma) as well as background tissues such as nodular cirrhosis and normal liver tissue.
肝细胞结节性病变(HNLs)包含了一个高度异质性的疾病群体。其中,在HGDNs与WD-HCC之间进行鉴别诊断可能具有较高的诊断难度,在活检样本中进行鉴别诊断更为复杂。为此我们致力于构建一个基于深度学习的辅助诊断系统来解决上述问题,并提高HNLs以及背景组织(结节性肝硬化及正常肝组织)的组织病理学检测水平
Conclusions
结论
Initially, we created a deep learning-based diagnostic model targeting HNLs. The model performed exceptionally well and played a crucial role in improving the diagnosis rate of early-stage HCC as well as aiding in risk stratification among patients with HNLs. Additionally, the system demonstrated notable strengths in analyzing patch-level features, which holds significant diagnostic importance, especially when dealing with limited or fragmented biopsy samples.
我们最初设计并构建了一个专为HNLs(肝细胞核病)而开发的深度学习诊断工具包, 该系统不仅展现出良好的性能, 还能有效提升早期肝癌(HCC)的确诊效率以及对HNLs患者的的风险评估等级划分. 此外, HnAIM方法在细粒度识别方面表现出明显的优势, 并且特别适用于零星或数量有限的真实检验样本, 在临床应用中同样具有重要的价值.
Results
结果
In this study, we extracted a total of 213,280 patches from whole slide images acquired from 738 patients. Based on evaluation metrics including F1 score and area under the curve (AUC), an optimal artificial intelligence model named HnAIM was selected as the best-performing model. This HnAIM achieved a robust performance with an AUC of 0.935 across seven categories in an independent external validation cohort. Notably, when analyzing biopsy specimens, HnAIM demonstrated superior agreement with multi-expert opinions compared to nine pathologists at both patch and whole-slide image levels.
通过分析来自738位患者的共1115张完整切片图像的数据集(共计213,280个切块),我们系统地训练并评估了多个分类模型。基于综合考量F1分数与曲线下面积(AUC)这两个关键指标,在独立的外部验证队列测试中确定最佳模型,并命名为肝细胞结节性人工智能模型(HnAIM)。该模型在7个评价类别上的平均曲线下面积值达到了0.935。值得注意的是,在活检样本分析方面表现出色——与亚专家意见的一致性高于9位病理学家的专业判断水平,在单个切块层面以及整体切片层面上均表现出了显著优势
Method
方法
Samples of surgical and biopsy specimens were derived from six hospitals. Six subspecialists evaluated each specimen. Four deep neural networks—ResNet50、InceptionV3、Xcept ence ensembles and the Ensemble—were employed to analyze these samples. The performances of these models were assessed using a combination of methods including confusion matrices receiver operating characteristic curves classification maps and heat maps. Nine pathologists’ diagnostic performance was compared with the optimal model’s predictive efficiency to further validate its effectiveness.
样本集包含了手术切片和活检切片,这些切片取自6家医院的不同机构.每位标本经由2至3名亚专科医师进行详细审核.实验采用深度学习算法,具体包括ResNet50型网络.InceptionV3型网络.Xception型网络以及集成学习策略.研究者们基于多组测试数据对这4种算法进行了性能评估,主要指标包括混淆矩阵 Receiver Operating Characteristic曲线分类性能指标以及热图可视化结果等指标.为了确保研究结论的有效性,该系统还与9位具有丰富临床经验的病理学家专家团队进行了对比分析,能够有效验证所构建模型的最佳预测效能
Figure
图

Figure 1. Data, study design, and HnAIM classification framework. Six independent data sets (Headquarters, Lingnan andYuedong Hospital of SYSUTH, SYSUFH, FSFPH, and GZFPH) were used in this study. (A) The Headquarters and YuedongHospital of SYSUTH data sets were used for developing a 7-category discriminative model, while the other 4 data sets wereused for the external testing. (B) The distribution of the samples for each type of liver nodule in model development (left) andindependent external validation (right). (C) Flow chart of the study. The data sets of the 7 categories were divided into thetraining (70%), validation (15%), and testing (15%) sets. Then, ROIs were labeled with green masks for each category. Patcheswere extracted from ROIs by OpenSlide library at 40 magnification with a size of 1024 1024. The training set was used totrain the ensemble model based on 3 basic models, while the validation set was used to fine-tune superparameters, such as learning rate, and the testing set used to evaluate models’ performances by confusion matrix, ROC curve, WSI-level classi-fication map, and patch-level heat map. Patches of liver biopsy specimens were predicted by the optimal model and areshown using a histogram, while the model’s referral decisions were compared with the ones made by different levels ofpathologists.
图1展示了数据集 研究设计以及HnAIM分类框架的整合关系。本研究采用了六个独立的数据组别 包括总部的数据 粤 south岭南地区的粤东医院(SYSUTH) 系统性 unified finite hall (SYSUFH) 基于南方医科大学附属医院(FSFPH) 和广东省人民医院(GZFPH) 的相关数据。其中 在开发阶段 (左) 采用总部和SYSUTH粤东医院的数据建立并训练了七类鉴别模型 而其他四个数据组别则用于验证阶段(右)。在模型开发过程中 首先对各类型肝结节样本在不同分类中的分布情况进行了统计分析 然后通过OpenSlide库以40倍放大率从感兴趣区域(ROIs)中提取1024×1024像素大小的图像补丁片 对其进行标准化处理。随后 将训练集(占总样本量的70%)用于基于三个基础模型构建集成学习框架 并通过交叉验证优化集成效果;验证集(15%)则用于调整学习算法的关键参数 如步长系数;测试集(15%)被用来评估整体性能指标 包括混淆矩阵 ROC曲线 WSIs分级分类曲线以及补丁级别的热图分析等指标以全面评估模型的诊断效能。对于最终获得的肝活检标本诊断结果 最优模型将基于分析结果输出诊断结论 并与临床路径医生的分级判断进行对比分析 以确保诊断决策的一致性和科学性

图2展示了深度学习模型的表现。(A)分类结果通过内部测试集上的混淆矩阵展示(Resnet50、Inception V3、Xception及集成模型)。数字表示正确分类(对角线)与错误分类(非对角线)的数量。(B)分别绘制了ROC曲线并计算了AUC值(Resnet50为黑色线条、Inception V3为蓝色线条、Xception为绿色线条及集成模型为红色线条)。其中Xcep-tion与集成模型表现最佳,其AUC值分别为0.9991(约),表明模型训练精度高。(C)分别绘制了ROC曲线并计算了AUC值(集成模型HnAIM在FSFPH、SYSUFH、GZFPH及全部外部验证数据集上进行独立外部验证)。
图2展示了深度学习模型的表现。(A) 在内部测试集上,在Resnet50、Inception V3以及Xception等基准网络的基础上,在集成学习框架下进行实验比较。具体而言,在混淆矩阵中可观察到各类别分类结果的分布情况。(B) 通过绘制ROC曲线并计算对应的AUC值来评估各算法的表现效果,在此过程中发现Xception网络及其集成版本展现出显著的优势,在各个测试指标上均优于其他候选算法。(C) 在独立测试集上的实验结果进一步验证了所提出方法的有效性与鲁棒性。

_Figure 3展示了手术样本WSI级别的全景分类地图:(A) WD-HCC, (B) HGDN, (C), LDN, (D), FNH, 和(E) HCA. 图片左侧显示原始WSI图像(放大倍数为0.4),中间部分展示了根据模型对相应切片区域预测生成的分类图谱;颜色从蓝色到红色表示不同肝癌病变类型。其中NC、LGDN、HGDN和WD-HCC中病变程度加深(标签值分别为2,5–7)。诊断标签定义如下:0代表背景区域;1代表非核肝细胞;2代表不典型肝细胞;3代表肝细胞癌;4代表脂肪性肝细胞;5代表轻度肝细胞癌;6代表中度肝细胞癌;7代表广泛性肝细胞癌. 右侧是饼图分布
calculate the proportion of different categories in each WSIs.
图3展示了外科学样本的全视野级别分类图像:其中(A) WD-HCC (B) HGDN (C) LDN (D) FNH 和(E) HCA分别代表不同的肝细胞类型。(左 panel) 原始WSIs均为放大倍数设置为0.4。(middle panel) 分类图像基于模型预测结果构建而成。各区域的颜色梯度从蓝至红反映了不同类型的肝细胞病变程度。(right panel)饼状图定量分析了各 WSIs 中各类别的分布情况,并标注了各类别的具体比例数值。具体而言 NC LGDN HGDN 和 WD-HCC 的颜色深浅变化与其对应的恶性程度等级均存在明显关联(标记值分别为2、5-7)。此外诊断标记如下:0 表示背景区域;1 表示非肝细胞 Lesions(NNLLs); 2 表示正常肝细胞 NC; 3 表示 HCA; 4 表示_fnh; 5 表示 LGDN;6 表示 HGDN;7 表示 WD-HCC

Figure 4 illustrates the performance of HnAIM in biopsy specimens and its comparison with pathologists. (A) The patch-level histogram of biopsy specimens highlights the model’s predictions across 7 distinct categories, emphasizing cell morphologic features. The category exhibiting the highest proportion was utilized as the final classification. Consensus rates among majority opinions for HnAIM and pathologists (each category evaluated by 3 individuals: junior, intermediate, and senior) were assessed across (B) all 961 patches and (C) 30 WSIs from biopsy specimens. To depict the typical performance of each group, consensus rates were presented as mean values derived from evaluations by 3 pathologists. Error bars indicate the 95% confidence intervals. Potential sources of disagreement between pathologists and HnAIM may include inherent uncertainties in interpreting a 3D structure through a 2D representation, ambiguities in diagnostic guidelines, limitations in tissue sample availability, and cognitive factors such as anchoring effects.
图4展示了HnAIM在活检样本中的性能对比及其与病理专家的一致性分析。(A) 活检样本的细胞补丁级直方图展示了模型对七个类别(如细胞形态)的预测结果。(B) 预测结果中占最大比重的类别被确定为最终分类依据。(C) 通过评估所有961个补丁以及每个活检样本(共30份)在七个类别(初级、中级和高级各三名)上的一致性评分来验证该方法的有效性。具体而言,在(A)部分中观察到Hn AIM与亚专家多数意见的一致率为X%,同时,在(B)中评估了所有961个补丁,在(C)中则针对每个活检样本进行了详细分析。误差条表示95%置信区间。在(B)(C)两部分中发现, 病理医生与AI系统间存在分歧的主要原因可能源于将三维结构转换为二维图像时固有的不确定性,诊断指南模糊性, 样本数量限制以及认知偏差
Table
表

Table 1 presents a Seven-Categorical Agreement between the majority opinion of subpecialists and a Hepatocellular-nodular AI model, which was derived from patches and whole-slide image analysis based on data from 30 liver biopsy specimens.
表1. 基于30份肝活检样本的局部区域切片和整体图像分析研究中涉及九位pathologist(AI-based models for liver nodular lesion)与亚专家多数意见的七类诊断意见的一致性

Table 2. Lesion Characteristics of Patients With Indeterminate Diagnoses After Three Independent Reviews.
表2. 经过三次独立审查后,诊断不确定的患者的病变特征
