部分监督多器官医学图像分割中的标记与未标记分布对齐|文献速递--基于多模态-半监督深度学习的病理学诊断与病灶分割
Title
题目
Labeled-to-unlabeled distributional alignment for partially supervised multi-organ medical image segmentation
部分监督多器官医学图像分割中的标记与未标记分布对齐
01
文献速递介绍
多器官医学图像分割(Mo-MedISeg)是医学图像分析领域中的核心研究课题之一,并且面临较大的技术挑战。该方法的目标是从输入图像中为每个像素分配相应的器官语义标签(例如肝脏、肾脏、脾脏或胰腺),而无法明确界定的区域则被归类为背景区域(Cerrolaza等, 2019)。随着深度学习技术的进步——包括卷积神经网络(CNNs)、视觉编码器-解码器架构以及视觉变换器——Mo-MedISeg已引起学界广泛关注,并在诊断干预、治疗规划等临床应用中得到广泛应用;此外,在计算机断层扫描(CT)、X射线成像等领域也展现出显著的应用价值
然而,在完全监督下进行深度语义图像分割模型的训练存在诸多挑战。这是因为通常需要大量精确的像素级标注样本(Wei等, 2016; Zhang等, 2020a)。而针对Mo-MedISeg任务而言,则更加困难重重:获取精确且密集的多器官注释不仅费时费力,并且还需要依赖稀缺的专业知识。相比之下,现有基准公共数据集如LiTS(Bilic等, 2023)和KiTS(Heller等, 2019)往往只能提供单一器官的注释信息;而对于与任务无关的关键器官则标记为背景区域。与基于完全标注的数据相比,在部分区域有标记的数据集更容易获得。为了缓解对完全标注数据集的巨大需求压力,部分监督学习(PSL)方法被引入到从多源部分标注数据中训练Mo-MedISeg模型的研究中(Zhou等, 2019; Zhang等, 2021a; Liu和Zheng, 2022)。具体而言,在这种框架下,每个数据集专注于某个特定类别器官的学习任务;直到所有感兴趣的目标类别均被覆盖为止。这样做的好处在于既避免了对密集注标数据的巨大依赖性;又能整合来自不同机构、不同类型的组织单元标注信息——尤其是当多个研究机构关注不同器官时
在Mo-MedISeg任务中使用PSL的一个关键挑战是如何在不复杂化多器官分割模型的情况下利用有限的标注像素和大量的未标注像素。每个部分标注的数据集都使用一个二进制图来为特定器官进行标注,指示某个像素是否属于目标器官。没有为其他前景器官和背景类别提供标签,导致每张图像的训练集同时包含标注和未标注像素。以往的方法通常仅从标注像素中学习(Dmitriev和Kaufman,2019;Zhou等,2019;Zhang等,2021a),或者将缺失的标签视为背景(Fang和Yan,2020;Shi等,2021)。然而,对未标注结构缺乏监督可能容易导致过拟合问题,而将未标注的解剖结构视为背景可能在没有完全注释数据集的先验信息的情况下误导和混淆Mo-MedISeg模型。尽管一些先进的方法(Huang等,2020;Feng等,2021;Liu和Zheng,2022)已经被提出来为未标注像素生成伪监督,这些方法通常只是采用自训练生成伪标签。它们使用标注像素训练教师模型,例如为每个数据集的多个单独单器官分割模型。然后,教师模型为每个部分标注数据集中的未注释器官生成伪标签,形成一个伪多器官数据集,用于训练学生多器官分割模型。学生模型的性能取决于伪标签的质量。不可靠的伪标签可能导致严重的确认偏差和性能下降(Chen等,2022a)。
Aastract
摘要
Partially-supervised multi-organ medical image segmentation aims to establish a unified semantic segmentation model through the integration of multiple partially-labeled datasets, each providing labels for a single class of organs. However, the scarcity of labeled foreground organs and the lack of supervision to distinguish unlabeled foreground organs from the background present significant challenges, leading to a mismatch between labeled and unlabeled pixels. Although existing pseudo-labeling methods can leverage both labeled and unlabeled pixels for learning, they are prone to performance degradation in this task due to their reliance on the assumption that labeled and unlabeled pixels share the same distribution. In this paper, we propose a labeled-to-unlabeled distribution alignment (LTUDA) framework that aligns feature distributions and enhances discriminative capability. Specifically, we introduce a cross-set data augmentation strategy that performs region-level mixing between labeled and unlabeled organs to reduce distribution discrepancy and enrich the training set. Additionally, we propose a prototype-based distribution alignment method that implicitly reduces intra-class variation and increases the separation between unlabeled foreground and background. This is achieved by encouraging consistency between the outputs of two prototype classifiers and a linear classifier. Extensive experimental results on the AbdomenCT-1K dataset and a union of four benchmark datasets (including LiTS, MSD-Spleen, KiTS, and NIH82) demonstrate that our method outperforms state-of-the-art partially-supervised methods by a significant margin, even surpassing fully-supervised methods.
部分监督多器官医学图像分割旨在利用多模态数据集开发统一的语义分割模型。每个数据集仅针对单一器官类别提供标注信息。然而,在实际应用中发现:1)对前景器官进行标记的有效性有限;2)缺乏有效的监督机制来区分未标记前景器官与背景区域;这给分割任务带来了巨大挑战:导致标注像素与未标注像素之间的分布不均衡问题严重存在。现有的伪标签方法虽然能够同时学习标注与未标注像素的空间布局信息;但其依赖于"标注像素与未标注像素空间分布相同"这一根本假设;因而难以有效应对这一特定任务中的分布偏移问题;最终导致模型性能明显下降。为此我们提出了一种基于特征分布对齐(LTUDA)的新框架;该框架通过对齐特征空间实现类内一致性增强的同时;还设计了新的判别模块以提升类间区分能力。具体而言:我们采用了跨模态数据增强策略;通过在不同类别器官之间进行区域级配准操作;有效降低了特征间的差异并扩大了训练样本多样性;同时我们还创新性地提出了基于原型匹配的分布对齐方法;通过优化过程使得两个原型分类器及一个线性分类器的输出保持高度一致性。实验结果表明:在AbdomenCT-1K数据集以及四个基准数据集(LiTS、MSD-Spleen、KiTS和NIH82)上开展的全面评估表明:我们的方法较现有的部分监督方法表现出了显著的优势:不仅达到了现有部分监督方法的高度水平;而且还超越了传统完全监督学习方案的表现水平
Method
方法
3.1. Preliminaries
Given a union of partially annotated datasets {𝐷1 , 𝐷2 , …, 𝐷𝐶} (𝐷𝑐= {(𝑋, 𝑌 𝑙 ) ∣ 𝑐 = 1,2, .. ., 𝐶}), our goal is to train an end-to-endsegmentation network that can simultaneously segment 𝐶 organs. 𝑋denotes the images in the 𝑐th dataset. The corresponding partial label𝑌* 𝑙 comprises labeled foreground organ pixels, unlabeled foregroundorgan pixels, and unlabeled background pixels. In 𝑌 𝑙 , only organ 𝑐has true class labels, and other organs are unknown. The multi-organsegmentation network needs to assign each pixel of an input image aunique semantic label of 𝑐 ∈ {0, 1, …, 𝐶}, where 𝐶 denotes the classsize.CNN-based Mo-MedISeg networks are commonly trained by multiclass cross-entropy loss L𝐶𝐸 (𝑥, 𝑦) = − ∑𝐶𝑐=0 𝑦𝑐 log 𝑝𝑐 , where 𝑦𝑐 is theone-hot ground truth corresponding to class 𝑐. However, this is notsuitable for partially-supervised segmentation. Given the absence ofan annotated background class in this scenario, we learn class-specificforeground maps, and transform the multi-classification task into multiple binary classification tasks (one vs. the rest). Given an image fromany partially labeled dataset 𝐷𝑖 as input, the segmentation networkgenerates class-wise foreground maps 𝑝 = {𝑝𝑐 } 𝐶 𝑐=1, which are normalized using a sigmoid. Partial binary cross-entropy loss L𝑝𝐵𝐶𝐸 is adoptedto train the segmentation network, which is defined as follows:
L𝑝𝐵𝐶𝐸 (𝑥, 𝑦) =𝐶∑𝑐=1I𝑦𝑐≠−1,
where 𝑦𝑐 and 𝑝𝑐 denote the actual ground ground truths and predicted probability maps pertaining to class 𝑐, respectively. For each labeled category 𝑖, the corresponding label value in 𝑦𝑐 assumes either a binary value of 0 or 1, which indicates whether a specific pixel belongs to that particular category. Notably, for all unlabeled categories, their respective label values are uniformly set to −1. The loss function 𝐿𝑝𝐵𝐶𝐸 exclusively focuses on optimizing segmentation maps associated with labeled categories. For instance, in datasets like LiTs, only partial labels are available for certain categories such as liver. Consequently, when dealing with other categories lacking labels entirely, this loss function fails to propagate gradients effectively. The segmentation network produces probabilistic distributions over all possible classes per pixel; henceforth during inference stage, it is imperative to assign each pixel its unique predicted class \hat{𝑦} from the label space \{0,\! \sim,\! C\}. Inspired by techniques used in anomaly detection (Hendrycks and Gimpel, 2016), we incorporate a linear_ component into our framework.
A threshold-based classification framework capable of generating segmentation maps into multiple classes. As shown in Equation (2), for each pixel, if the probabilities of all foreground classes fall below a specific threshold 𝜏, then we classify this pixel as background. Otherwise, it is assigned to its respective foreground class based on its maximum probability value.
_ , otherwise. (2) While ignoring unannotated categories may seem like a straightforward solution, it actually results in suboptimal performance because only a small portion of the available data is utilized. As demonstrated in Figure 1(a), the feature visualization of the baseline model clearly shows that it learns exclusively from labeled pixels, completely disregarding the unlabeled organs. Observations reveal that the unlabeled prototypes diverge significantly from their labeled counterparts, particularly for foreground categories such as pancreas and spleen. Additionally, separating unlabeled foreground tissues from the background becomes challenging solely based on threshold 𝜏, since there is no supervision information provided by background regions. Consequently, directly minimizing L𝑝𝐵𝐶𝐸 on labeled pixels alone may inadvertently introduce issues such as overfitting and cause confusion between foreground organs and background structures.
给定一个部分标注数据集的联合体 { 𝐷1, 𝐷2, …, 𝐷𝐶 } (𝐷𝑐 = { (𝑋, 𝑌 𝑙) | 𝑐 = 1, 2, …, 𝐶 }),我们的目标是训练一个端到端的分割网络,能够同时对 𝐶 个器官进行分割。𝑋 表示第 𝑐 个数据集中的图像。对应的部分标签 𝑌 𝑙 包括标记的前景器官像素、未标记的前景器官像素和未标记的背景像素。在 𝑌 𝑙 中,只有器官 𝑐 有真实的类别标签,其他器官的标签未知。多器官分割网络需要将输入图像的每个像素分配一个唯一的语义标签 𝑐 ∈ {0, 1, …, 𝐶},其中 𝐶 表示类别数量。基于卷积神经网络的 Mo-MedISeg 网络通常通过多类别交叉熵损失 L𝐶𝐸 (𝑥, 𝑦) = − ∑𝐶𝑐=0 𝑦𝑐 log 𝑝𝑐 进行训练,其中 𝑦𝑐 是与类别 𝑐 对应的一热编码真实值。然而,这不适用于部分监督分割。由于在这种情况下缺少标注的背景类别,我们学习特定类别的前景图,并将多分类任务转换为多个二分类任务(一个类别与其余类别的比较)。对于来自任何部分标注数据集 𝐷𝑖 的图像,分割网络生成类别特定的前景图 𝑝 = {𝑝**𝑐 } 𝐶 𝑐=1,这些图通过 sigmoid 函数进行归一化。采用部分二元交叉熵损失 L𝑝𝐵𝐶𝐸 来训练分割网络,其定义如下:
L𝑝𝐵𝐶𝐸 (𝑥, 𝑦) = 𝐶∑𝑐=1 I𝑦𝑐≠ −1
其中符号 y_c 和 p_c 分别代表第 c 类的真实标签及其预测概率图。对于被标记的类别 i 来说,在其对应的分割区域内变量取值要么是 0 要么是 1;而对未被标记的类来说,则将变量取值设为 −1。所使用的损失函数仅针对那些具有明确标签的区域进行计算,并不会反向传播梯度至未标注区域。在推理过程中我们需要对每个像素赋予一个唯一的分类结果 \hat{y} ,其取值范围定义在集合 {0, 1, …, C} 上以确保每个像素都被正确归类到某一特定类中去。受异常样本检测方法(Hendrycks 和 Gimpel, 2016)的影响,在生成多分类分割结果时我们采用了基于线性阈值分类器的方法:若某像素所有前景类别的预测概率均低于设定阈值 \tau ,则将其归类到背景类;反之,则将其分配至具有最高预测概率的那一类。
̂𝑦 = { arg max𝑐∈{1, …, 𝐶} 𝑝𝑐, 如果 max𝑐∈{1, …, 𝐶} 𝑝𝑐 ≥ 𝜏,0(背景类),否则。
(2) 尽管仅依赖未标记数据可能显得简单直观(虽然),但这可能会导致性能略逊一筹(但)。原因在于所利用的数据量有限(因为)。图1(a)展示了基线模型的特征可视化结果(图1(a)显示了...),其中该模型主要基于标记像素进行训练(该模型仅从...),而忽略了未标记的器官(同时忽略了...)。我们能够观察到,在这种情况下(我们能够观察到),未标记的原型与前景类别的典型代表之间存在显著差异(我们可以观察到...),尤其是在胰腺和脾脏等器官上(此外,在这些部位上...)。由于缺乏任何背景监督信息(由于缺乏...),仅仅依靠阈值τ难以将未标记的前景区域与背景区域区分开来(尽管如此,在这种情况下...)。因此,在这种情况下直接最小化LpBCED损失函数可能导致过拟合问题,并且容易使前景器官与背景区域发生混淆(因此,在这种情况下直接最小化LpBCED损失函数可能会导致过拟合问题,并且容易使前景器官与背景区域发生混淆)
Conclusion
结论
Partially-supervised segmentation faces the challenge of feature distribution mismatch between labeled and unlabeled pixels. To address this, this paper introduces a novel framework named LTUDA for partially supervised multi-organ segmentation. The approach consists of two core components. First, we implement a cross-set data augmentation strategy that generates interpolated samples between labeled and unlabeled pixels, thereby minimizing the difference in their feature representations. Second, we introduce a prototype-based distribution alignment module designed to bridge the gap between labeled and unlabeled pixel distributions. This module ensures consistency between labeled prototype classifier and unlabeled prototype classifier to avoid confusion between unlabeled foreground and background. Experimental results on both toy datasets and large-scale partially labeled datasets demonstrate the effectiveness of our method. Our proposed method has the potential to enhance foundation models. Recent foundation models (Liu et al., 2023; Ye et al., 2023) focus on leveraging large-scale and diverse partially annotated datasets across different modalities (CT, MRI, PET) to segment various organs and tumors. However, these models typically do not utilize unlabeled pixels or account for domain shifts across institutions during training, leading to significant distribution mismatches between labeled and unlabeled pixels. Our framework specifically targets these challenges by addressing the distribution alignment issue, thereby improving the effectiveness of public datasets. Future research directions include applying partially supervised segmentation methods to realistic clinical scenarios (Yao et al., 2021). Our approach shows that multi-organ segmentation models can be effectively trained using small-sized partially labeled datasets due to its convenience in customization based on medical professionals' needs. If public datasets lack specific anatomical structures of interest, experts only need to annotate a few additional classes to develop comprehensive multi-organ segmentation models.
面对标记像素与未标记像素之间特征分布不匹配的问题时
我们提出了一种具有潜力的方法为基础模型的发展提供支持。最近的研究(Liu et al., 2023; Ye et al., 2023)集中关注利用大规模多样化的部分注释数据集来实现不同类型器官与肿瘤的分割任务,并充分挖掘了不同模态数据(如CT、MRI、PET)的公共资源库中的潜在价值。尽管这些方法取得了显著效果(Yao et al., 2021),但它们通常未能有效利用未标记像素的信息或考虑到来自不同机构的部分标记数据集之间的领域差异问题。这种训练数据的异质性使得标注像素与未标注像素之间的分布偏差更加显著化(图略),从而带来了更大的挑战性问题。针对这一关键难点问题(图略),我们开发了一种基于标记到未标记分布框架的新方法来应对这一挑战(图略)。展望未来(图略),一个值得探索的方向是将部分监督分割技术应用于更为真实的临床场景中去(图略)。我们的研究证实了仅利用小规模部分标记数据集便能训练出多脏器分割模型的可能性(图略)。这种便捷性特点使得医疗专家可以根据临床需求自定义多脏器分割模型成为可能(图略)。如果公共数据集中的标注信息未能覆盖某些特定解剖结构的需求,则建议只需补充少量缺失类别的标注信息即可建立完整的多脏器分割模型。
Figure
图

Fig. 1. Comparisons of t-SNE feature visualization on the toy dataset consisting of four partially-labeled sub-datasets. The feature distribution of labeled and unlabeled pixels fordifferent classes is visualized. For each foreground category, only one sub-dataset provides a labeled set, while the other three provide unlabeled sets. Since each sub-dataset doesnot provide the true label of the background, the background is completely unlabeled. We have superimposed the feature centers of the labeled set and unlabeled set, i.e., labeledprototypes and unlabeled prototypes, of each foreground category on the feature distribution. Additionally, we visualized the feature center of the background classes across allsubsets. (a) Baseline model (trained on labeled pixels). The labeled prototype and unlabeled prototypes of the foreground classes are not aligned. (b) Baseline model with cross-setdata augmentation (CDA). The CDA strategy effectively reduces the distributional discrepancy between labeled and unlabeled pixels for the foreground classes. (c) Our proposedmethod. The labeled prototype and unlabeled prototypes of each foreground class almost overlap
图 1展示了t-SNE技术在由四个部分标记的子数据集构成的玩具数据集合中的特征可视化结果。该图表详细描绘了不同类别之间的标记与非标记像素特征分布情况。对于每一个前景类别而言,只有一个子数据集提供了完整的标记信息,而其他三个子数据集中则均为非标记状态。值得注意的是,由于所有子数据集中背景类别的真实标签均缺失,因此背景区域被明确定义为完全非监督的状态。为了更直观地反映各分类间的差异性,我们叠加展示了每个前景类别对应的标记与非标记像素特征中心位置(即所谓的"原型")。此外,该图表还整合了三种不同的实验结果:(a)基础模型;(b)引入跨域增强策略后的模型;(c)本文提出的新方法。实验结果表明,本文方法相较于前两种方案表现出更为优越的表现

_Fig. 2. (a) The overall framework of the developed LTUDA method is built upon cross-set data augmentation and prototype-based distribution alignment. (b) The details of the prototype-based distribution alignment module are described. Our method is based on a well-known teacher–student framework, employing weak augmentation—specifically rotation and scaling—for teacher model inputs, while applying strong augmentation—cross-set region-level mixing—for student model inputs. The linear classifier refers to a linear threshold-based classifier detailed in Equation (2) within Section 3.1. The student model incorporates two prototype classifiers, with predictions from both teacher model outputs and partial labels serving as pseudo-labels to supervise learning across all three classifiers in this module. Notably, "copy" indicates that background class prototypes remain unchanged when unlabelled.
图 2.(a)所展示的是LTUDA方法的整体架构,不仅包含了跨域数据增强技术以及基于 prototype 的 distribution 对齐方法,具体阐述了基于 prototype 的 distribution alignment module 的细节信息。我们采用的是广泛认可的 teacher-student 框架,分别采用了弱增强措施(如旋转与缩放)以及强增强策略(涉及跨区域混合处理)来处理教师与学生的输入图像。线性分类器对应于第3.1节所述方程(2),即线性阈值分类器。在 student 模型中引入了两个 prototype 分类器,其中术语‘复制’具体指背景类别中的 annotated prototype 设置与未 annotated prototype 等值处理的情况,用于监督 student 模型中三个分类器的学习过程

Fig. 3展示了CutMix方法生成的不同增强度图像的可视化结果。(a)与(b)将从𝑥_{𝑏𝑤}处裁剪的部分粘贴到相同位置的𝑥_{𝑎𝑤}中。(a)与(b)框的位置坐标及尺寸有所不同.
图 3. 使用 CutMix 生成的不同强增强图像的可视化。
(a) 和 (b) 将被复制源自 x_b^w 的图像块并粘贴至与之相同的 x_a^w 位置。(a) 和 (b) 的裁剪框坐标与尺寸各不相同。

_Fig. 4. 图4展示了LSPL方法的可视化结果。来自LiTS、MSD-Spleen、KiTS以及NIH82等数据集中的样本依次进行展示(a)至(g)。其中(a)部分展示了四个基准数据集最初提供的单器官标注信息。(b)部分则展示了四个器官的完整标注信息。(c)至(g)展示了不同方法所得到的分割结果。通过带有突出显示边界的白色框图框可以看出我们方法具有更好的预测效果。
图4展示了LSPL的可视化示例。这些案例均源自于LiTS、MSD-Spleen、KiTS和NIH82等数据集,并按从左至右的顺序依次排列。(a)四个基准数据集各自提供的单脏器标注结果。(b)对四个脏器进行完整的注释。(c)-(g)展示了多种分割方法的结果对比。通过白色方框突出显示的方式可以看出我们提出的方法在预测效果上具有显著优势。

Fig. 5. Ablation results. (a) The count of prototype instances across each category. (b) The count of strong perspectives in each instance.
图 5. 消融结果。(a) 每个类别的原型数量。(b) 强视图的数量。

Fig. 6 illustrates visualizations of images with multi-angle augmentation generated by CarveMix. In (a) and (b), the cropped ROI from x_{bw} is stickied to the same position in x_{aw}. Additionally, (a) and (b) also demonstrate the cropping of ROIs from different foreground organs.
图 6展示了CarveMix生成的不同强度增强图像的可视化效果。(a) 和 (b) 基于从 x_{bw}提取并粘贴至同一位置的ROIs进行操作。(a) 和 (b)分别处理了不同前景器官对应的ROIs区域

_Fig. 7. (a) 该算法在不同标记数据量下的半标记学习效果。 (b) 该算法在不同标记数据量下的半标记及无标记学习效果。
图 7. (a) 随着标记数据量的变化,在部分标注样本上进行训练的学习表现。(b) 随着标注度量变化,在混合使用标注样本与非标注样本的情况下进行联合训练的学习表现。

Fig. 8. Ablation results of different segmentation backbones.
图 8. 不同分割骨干网络的消融结果。

Fig. 9. Ablation study of weight parameters of L𝑙𝑝𝑟𝑜𝑡𝑜 and L𝑢𝑙𝑝𝑟𝑜𝑡𝑜.
图 9. L𝑙𝑝𝑟𝑜𝑡𝑜 和 L𝑢𝑙𝑝𝑟𝑜𝑡𝑜 权重参数的消融研究

Fig. 10. Ablation study of weight parameters of L𝑝𝑝𝑑 and L𝑝𝑝𝑐 .
图 10. L𝑝𝑝𝑑 和 L𝑝𝑝𝑐 权重参数的消融研究。
Table
表

Table 1A brief description of the large-scale partially-labeled dataset.
表1 大规模部分标记数据集的简要描述。

Table 2.0 presents the quantitative assessment of semi-supervised multi-organ segmentation on a toy dataset.
表2 在玩具数据集上进行部分监督多脏器分割的定量结果。

Table 3 Quantitative assessment of partially-supervised multi-organ segmentation within four partially-labeled datasets (referred to as the LSPL dataset).
实验表3在四个子区域标记数据集(LSPL 数据集)上进行部分监督式的多器官分割定量评估结果

Table 4 presents an ablation study of key elements on the LSPL dataset. CDA represents cross-set data augmentation, while PDA represents prototype-based distribution alignment. Green numbers signify the performance enhancement relative to the baseline.
该表格在 LSPL 数据集中对关键组件进行消融研究。其中 CDA 代表跨集数据增强方法,则 PDA 则代表基于原型的方法用于分布对齐。通过对比实验结果表明,在测试条件下采用该方法可获得显著的效果提升

Table 5, Performance of generalization across various methods in the context of two multi-organ datasets.
表5 在两个多脏器数据集上不同方法的泛化性能。

表6A核心组件在玩具数据集上的消融研究。CDA表示跨集合数据增强方法,PDA表示基于示例的分布对齐方法。绿色数值表明与基线相比性能提升.
表6展示了在玩具数据集上对关键组件消融效果的研究情况。其中CDA代表跨集合的数据增强技术,而PDA则代表基于原型的分布对齐方法。实验结果表明,在基线模型的基础上采用了绿色标注的数据标记策略后,在测试集上的性能提升幅度较大。

Table 7Ablation results of different data augmentation methods.
表 7 不同数据增强方法的消融结果。

Table 8Comparison of different paste position strategies for CutMix
表 8 CutMix 不同粘贴位置策略的比较。

Table 9Intra-class and inter-class variance for different methods
表 9 不同方法的类内方差和类间方差。
