无监督医学图像翻译与对抗扩散模型| 文献速递-深度学习结合医疗影像疾病诊断与病灶分割
Title
题目
The Self-supervised Medical Image Translation Employing Adversarial Diffusion-Based Techniques
无监督医学图像翻译与对抗扩散模型
01
文献速递介绍
在全面评估人体解剖结构和功能方面,多模态成像扮演着关键角色[1]。通过各自模态捕获的丰富组织信息有助于提升诊断准确性并改善后续成像任务的表现。然而,在经济与劳动成本双重挑战下推动多模态协议普及仍显艰难。
医学图像转换方案是一种强有力的策略,在已获得的源模态基础上进行目标模态重建过程。鉴于跨模态间组织信号的变化呈现出非线性特征这一特点使得该过程面临较大挑战。
在关键时候,利用学习方法结合了非线性数据建立的知识基础,并通过优化问题条件提升了性能水平。
基于学习的方法进行图像是如何转换?该方法通过训练深度神经网络来建模源与目标之间的映射关系。具体而言,它旨在从数据中提取特征并建立相应的映射关系.近来研究者们致力于探索生成对抗网络 (GAN) 在跨语言或多语言场景下的应用.其显著优势在于能够生成高质量且具有语义一致性的目标语言文本.
通过捕获目标分布的关键信息,在此对抗机制框架下, 鉴别器协同指导生成器完成源至目标图像的一次性映射过程, 并在此过程中实现了多模态数据的有效转化. 该框架在多个图像翻译任务中展现了卓越的效果, 包括跨MR扫描仪的合成, 多对比MR合成以及跨模态合成等多种应用.
尽管具有强大的能力,在GAN模型中通过生成器与鉴别器之间的相互作用机制而非显式地表示目标模态的概率分布。这种非显式表征的方式可能使得模型在训练过程中容易受到学习偏差的影响(如收敛不足或出现模式崩溃)。此外,在实际应用中GAN模型通常采用一种快速的一次性采样方法来推断潜在空间的映射关系(即没有中间步骤可循),这从根本上限制了网络在映射过程中可靠性的保证。反过来这些局限性可能会制约了生成图像的质量与多样性。
作为一种有潜力的替代方案,在最近的计算机视觉研究中引入了一种基于明确可能性编码与逐步采样的新型扩散架构。该架构通过改进无条件生成建模任务中样本的质量来实现对现有技术的有效补充。然而,在医学图像翻译领域中扩散技术的发展仍面临诸多瓶颈问题:其一由于高计算开销导致的应用局限;其二源于常规扩散框架难以实现非配对训练的技术障碍
Abstract
摘要
Image generation between modalities enhances the diversity of medical imaging protocols. A commonly used approach for generating target images involves one-shot mapping through generative adversarial networks (GAN). However, GANs that implicitly model image distributions may exhibit limited accuracy in capturing sample characteristics. To address this challenge, we introduce a novel method based on conditional diffusion processes, referred to as SynDiff. This approach leverages a unique methodology involving conditional diffusion processes to establish a direct correlation with image distributions. During inference, we employ large diffusion steps combined with adversarial projections in the reverse process to achieve efficient and precise sampling. To enable effective training on unpaired datasets, we design a cycle-consistent architecture incorporating both diffusive and non-diffusive modules for seamless translation between two imaging modalities. Extensive evaluations demonstrate that SynDiff achieves superior performance compared to existing GAN and diffusion-based models in tasks involving multi-contrast MRI and MRICT translation. Our results highlight SynDiff's ability to deliver both quantitative and qualitative improvements over competing approaches.
通过源至目标模态转换机制来补充缺失图像片段的方式能够有效提升医学成像协议的多样性
Method
方法
We validated the effectiveness of SynDiff across two comprehensive multi-contrast brain MRI datasets sourced from IXI (dataset ID: BRATS [61]) and a dedicated multi-modal pelvic MRICT dataset. For each dataset, we conducted a three-way partition into training, validation, and test subsets ensuring no subject overlap between these groups. All unsupervised medical image translation models were trained using unpaired image pairs but required paired and spatially registered source-target volumes for accurate performance evaluation. In the validation and test sets, individual subject volumes were independently spatially registered using affine transformations derived from mutual information-based registration metrics. Following this process, we normalized each imaging volume to achieve uniform mean intensity levels across subjects before conducting further analysis. Finally, the maximum voxel intensity values observed across all subjects were standardized to ensure consistent intensity ranges for subsequent processing steps. All cross-sectional images underwent zero-padding as needed to achieve uniform 256 × 256 pixel dimensions in all datasets prior to model training.
我们在两个多对比脑部MRI数据集(IXI、BRATS)以及一个多模态盆腔MRI CT数据集 上展示了SynDiff技术。针对每个研究数据集, 我们进行了三段划分构建训练集、验证集和测试集,并确保各主体之间无任何重叠区域。然而,在进行性能评估时需要实现源-目标体积的配对与注册。因此,在验证与测试阶段中, 每个主体内的不同体积切片都会被空间配准处理以便计算相应的定量指标值。配准过程利用FSL中的仿射变换与基于互信息损失函数的方法来完成这一操作流程。在此过程中,默认情况下会补充足够的零填充以满足统一的数据尺寸要求即最终达到256 × 256的标准分辨率水平
Conclusion
结论
This study presents a new adversarial diffusion model designed for translating medical images between different imaging modalities. Target images are efficiently synthesized by the fast diffusion process within SynDiff, alongside an accurate reserve sampling mechanism facilitated by a conditional adversarial projector. Unsupervised learning is facilitated through an architecture that ensures consistent cycles while integrating coupled diffusion processes across two imaging modalities. The system demonstrates significantly better performance than current state-of-the-art GANs and alternative diffusion models, showcasing remarkable potential in high-fidelity medical image translation. The conditional aspect of SynDiff's approach could provide additional performance advantages over traditional GAN-based methods in various applications beyond image translation, including denoising tasks and super-resolution imaging.
在本研究中, 我们提出了一种创新性的对抗扩散模型, 用于实现医学图像在源模态与目标模态之间的转换。该方案通过快速扩散过程有效地生成目标图像, 并利用条件对抗投影器进行精确地逆向采样操作。采用自洽结构, 使得无监督学习得以实现, 同时也揭示了两种模态间相互作用所形成的扩散机制特性。相较于目前最先进的GAN以及一般的扩散模型而言, 该方案能够生成更高质量的图像, 并且在高保真度医学图像翻译方面展现出巨大的潜力优势; 此外, 在其他应用(如去噪和超分辨率)中也可能比GAN提供更好的性能优势
Results
结果
The researchers evaluated the performance of SynDiff in unsupervised MRI contrast translation by comparing it against leading-edge non-attentional GANs such as cGAN, UNIT, and MUNIT. They also assessed its effectiveness relative to attentional GANs like AttGAN and SAGAN and conventional diffusion models including DDPM and the modified architecture of MIND. The following experiments were conducted using brain imaging data from healthy participants enrolled in the IXI database.
我们对SynDiff在无监督MRI对比度翻译任务中的性能进行了系统评估,并对其与当前最先进的非注意力生成对抗网络(cGAN、UNIT、MUNIT)、注意力型生成对抗网络(AttGAN、SAGAN)以及传统扩散模型(DDPM、UNIT-DDPM)之间的性能差异展开了对比分析。随后,在IXI健康主题数据库中的标准脑部扫描数据集上展开了实验验证
Figure
图

Fig. 1. a) Diffusion models progressively evolve actual image samples for the specific modality class (x₀) into isotropic Gaussian noise (x**T) over approximately 10,000 iterations. Each forward iteration (right arrows) systematically increases the temporal depth of each sample by adding noise according to the forward transition probability q
以上输出遵循了所有给定的改写规则

_Fig. 2. For unsupervised learning, SynDiff exploits a cycle-consistent architecture that enables bidirectional translation between two modalities (A, B). For synthesizing a target image ˆx A0 of modality A, the diffusive module in Fig. 1b relies on guidance from a source image y B of modality B for matching anatomical structures. However, this may not always be feasible due to data constraints. To address this challenge, SynDiff employs a non-diffusive module to first estimate a paired source image ˜y B from x A0. Similarly, for synthesizing images of modality B using the diffusive module, the non-diffusive module first estimates paired source images ˜y A and ˜y* A from x B0. Part (a) illustrates how this is achieved by incorporating two generator-discriminator pairs (Gφ A,B, Dφ A,B) that generate initial translation estimates for x A0 → ˜y B (orange) and x B0 → ˜y* A (green). Part (b) shows how these initial estimates are utilized as guiding sources in the diffusive module. For maintaining cycle consistency during training, SynDiff also incorporates two generator-discriminator pairs (Gθ A,B, Dθ A,B) within its diffusive module to produce denoised image estimates for (x At, ˜yB , t) → ˆx A−k (yellow) and (x Bt, ˜yA , t) → ˆx Bt−k (blue).
在无监督学习场景下,SynDiff采用了循环一致的架构设计,在模态A与模态B之间实现了双向信息转换。用于生成模态A的目标图像ˆx A0,则需借助模体B中具有相同解剖结构的源图景y B作为指导。然而,在训练数据集中通常难以找到具有相同解剖结构的配对源图景。为此,SynDiff采用非扩散模块从已知目标图景x A0出发估算其对应的配景源图景˜y B. 同样地, 为了生成模体B的目标图景ˆx B0, 非扩散模块同样会基于已有目标图景x B0来推断其对应的配景源图景˜y A.
非扩散结构由两个Gφ_A,B与Dφ_A,B组成的生成器-鉴别器对组别构成,并负责估计x^*_A0→y_B(橙色)以及x_B0→y_A(绿色)的初始翻译估计。
b) 这些初始翻译估计值为y_tilde_A_B,在扩散模块中被设定为主导源模态图像。为了实现循环一致性学习目标,在扩散模块中还包含两个由生成器G_theta_A_B和鉴别器D_theta_A_B组成的对抗网络对。这两个对抗网络分别针对输入序列中的两种不同情况——即(x_{At}, y_tilde_B, t)到hat{x}{At-k}(黄色预测结果)以及(x{Bt}, y_tilde_A, t)到hat{x}_{Bt-k}(蓝色预测结果),并被设计用于生成相应的去噪预测结果

Fig. 3. SynDiff's demonstration on IXI focused on translating between MRI contrast modalities. Synthesized images generated by competing approaches are presented alongside their respective source and reference target images in representative cases such as a) T1 to T2 conversion and b) T2 to PD mapping. The display windows for cases a) [0, 0.65] and b) [0, 0.80] were selected. The performance of SynDiff surpasses that of baseline methods by demonstrating reduced noise levels and artifacts while maintaining high anatomical accuracy.
Fig. 3. 在IXI数据集上展示了SynDiff在MRI对比度之间的翻译比较。对现有方法的合成结果进行了对比分析,并分别展示出代表性的源图像(T1或T2)及其目标(参考)图像:a) T1→T2转换;b) T2→PD转换。各子图显示窗口分别为a) [0, 0.65]区间;b) [0, 0.80]区间范围。相较于基线方法,在保持较高解剖学准确性的同时,SynDiff生成的图像呈现出更低噪声水平及更少伪影干扰

Fig. 4. SynDiff在BRATS上实现了MRI对比转换的任务展示。合成图像配合来源图像与参考目标图像(ground-truth)进行展示,在a) T1到T2、b) T2到FLAIR等典型任务中可见。a)窗口范围设定为[0 0.75]、b)为[0 0.80]。与基准方法相比,SynDiff降低了噪声/ artifact水平并更精确地呈现了细节结构.
图4展示了SynDiff在MRI对比度转换中的应用效果,在BRATS数据集上进行了验证。该方法呈现了合成图像以及代表性的源图像与目标(参考)图像:a) T1到T2的转换;b) T2到FLAIR的转换。具体范围为a) [0 0.75];b) [0 0.80]。与基线方法相比,SynDiff显著降低了噪声和假影水平,并能更精确地表现了细节结构。

Fig. 5. SynDiff was demonstrated on the pelvic dataset for multi-modal MRI-CT translation. Synthesized images are displayed along with the sourceand the ground-truth target (reference) images for representative a) T2→CT, b) accelerated T1→CT tasks. Display windows of a) [-1000 1050] HU,and b) [-1000 1000] HU are used. Compared to diffusion and GAN baselines, SynDiff achieves lower artifact levels, and more accurately estimatesanatomical structure near diagnostically-relevant regions.
Fig. 5. 展示SynDiff在多模态MRI-CT翻译方面具有显著效果。 该研究通过实验验证了该方法的有效性。 实验结果表明,在盆腔数据集上进行了详细的性能评估。 研究者呈现了一组典型任务的结果:包括从T2序列到CT(a)以及加速从T1序列到CT(b)的任务表现。 所有结果显示均基于相同的显示窗口设置:a) -1000至1 HU, b) -1 至 HU. 相较于扩散模型及GAN基线方法, 研究者的方法表现出更低的伪影水平, 并能更精确地估计诊断相关区域周围的解剖结构.

Fig. 6. Performance analysis of competing methods in relation to increasing noise levels on source-modality images. Results from representative experiments show that...
T2→CT task in terms of PSNR (left), SSIM (right).
如图6所示,在对源模态图像施加噪声水平函数方面进行了对比研究。通过PSNR(左侧)与SSIM(右侧)分析展示了具有代表性的T2→CT转换任务的结果。

SynDiff's adversarial projector, operating at T\!\!/k\!=\!4 steps, was evaluated against a variant utilizing an \ell_1-loss projector at both T\!\!/k\!=\!4 and T\!\!/k\!=\!1{,}k. Imgs depicting uncond synthesis tasks are provided: (a IXI's T_1, b BRATS' T_2, c pelvic CT). For imaging windows: (a)[−∞ to −5HU], (b)[−∞ to +5HU], (c)[+5HU to +7HU].
在SynDiff框架中采用了对抗投影器,并与基于ℓ1损失函数设计的另一种变体模型进行了对比研究,在参数设置上取值为T/k^*=4步及T/k=10^3。该研究展示了无条件合成任务中获得的图像样本:包括IXI平台中的T1加权图像、BRATS平台中的T2加权图像以及盆腔数据集中的CT扫描。其显示窗口设置分别为a) [0, 95] HU、b) [−5, 95] HU以及c) [−5, 95] HU。
Table
表

TABLE I provides a detailed description of variables related to images, diffusion processes, networks, and probability distributions. Throughout the manuscript, vectorial quantities are systematically annotated using boldface notation.
表 I图像、扩散过程、网络和概率分布相关变量描述。
在整篇文章中,向量量用粗体字体标注。

Table II summarizes the performance outcomes of multi-contrast MRI translation tasks conducted on the Ixi dataset. The peak signal-to-noise ratio (PSNR, dB) and structural similarity index (SSIM, %) metrics are presented as the mean ± standard deviation across the test set. italicized or bolded to highlight the top-performing model within each task category.
表II展示了IXI数据集中系统性比较MRI翻译任务的性能评估结果。其中PSNR值以分贝为单位表示,并附有±的标准差信息;SSIM值以百分比形式呈现。通过加粗显示的方式,在每个任务中突出表现最优的模型。

Table III presents the performance metrics for multi-contrast MRI translation tasks within the context of BRATS. The PSNR values (in dB) and SSIM percentages are listed as the mean ± standard deviation across the test set
表III系统性地在BRATS数据集中对MRI翻译任务进行了全面对比,并列出了各方法下的性能指标。其中,PSNR以dB为单位表示为测试集的平均值及其标准差;SSIM以百分比表示为测试集的平均值及其标准差。

Table IV presents results of multi-modal MRI-CT translation tasks within the pelvic dataset. PSNR (dB) and SSIM (%) are reported as mean±std across the test set. 'acc.' stands for accelerated.
表4展示了盆腔区域的数据集上多模态医学成像技术的翻译任务性能指标。其中表格列出了测试集上的平均PSNR值(以分贝为单位)及SSIM值(以百分比表示),并标注了加速比(ACC.)。

The table presents the average training duration across cross-sections (sec), inference speed across cross-sections (sec), and memory consumption in gigabytes.
表 V每个横截面的平均训练时间(秒)、推断时间(秒)和内存负载(GB)。

TABLE VI presents the performance metrics of different model variants in unconditional synthesis tasks. FID values are listed across the training set.
基于VI的变体模型在无条件生成任务中的性能表现优异。其中FID值(Frechet Inception Distance)反映了生成图像分布与真实数据分布之间的差异程度。

TABLE VII presents the results of variant models removed adversarial loss, cycle-consistency loss, and diffusion module. PSNR and SSIM are presented as mean ± standard deviation across the test set.
表 VII变体模型排除了对抗损失、循环一致性损失以及扩散模块的影响后的表现。PSNR和SSIM指出了测试集上平均±标准差的结果。

Table VIII presents performance metrics for variant models under variable step counts t/k and variable loss-term coefficients (λ₁φ, λ₁θ, λ₂φ, λ₂θ). psnr and ssim values are shown as means ± standard deviations across the test set.
该变体模型针对不同的阶段T/k值和损失因子权重(λ₁φ, λ₁θ, λ₂φ, λ₂θ)的表现进行了评估。PSNR与SSIM则反映了测试集上平均±标准差的结果。

The performance of variant models on the test set, measured as mean±std, was evaluated. The non-diffusive module underwent pretraining within variant models. When employing pretrained-frozen configuration, the non-diffusive module remained unchanged during diffusive module training. In contrast, in the pretrained-trained setup, both modules were trained simultaneously.
实验结果展示的是表 IX 所示的变体模型在测试集上的平均±标准差性能。非扩散模块已被变体模型进行过预训练。当进行预 training 并冻结阶段时,在对 diffusion module 进行微调的过程中,默认情况下并未对 non-diffusion module 进行参数更新。然而,在同时进行 pretraining 和 training 阶段,并对 diffusion module 进行微调的过程中,默认情况下会实现 parameter updating for the non-diffusion module.

The performance table X displayed variant models demonstrated an efficiency of mean±std across the test set. Within variant models, only the non-diffusive module underwent exclusive training for nₙD epochs, whereas the diffusive module achieved full training
实验结果表明,在测试集上,变体模型实现了平均±标准差的性能水平(如表X所示)。具体而言,在变体模型中,默认情况下,默认情况下,默认情况下,默认情况下,默认情况下,默认情况下)。非扩散模块仅限于nₙ₋D个时期的训练阶段,在此期间内完成必要的参数初始化和基础特征学习过程;而扩散模块则在整个训练过程中进行了完整的训练。
