Advertisement

稀疏视角CBCT重建的几何感知衰减学习|文献速递-基于深度学习的病灶分割与数据超分辨率

阅读量:

Title

题目

Geometry-Aware Attenuation Learning forSparse-View CBCT Reconstruction

稀疏视角CBCT重建的几何感知衰减学习

01

文献速递介绍

稀疏视角锥形束计算机断层扫描(CBCT)重建的几何感知学习方法

锥形束计算机断层扫描(CBCT)被广泛应用在牙科、脊柱和血管疾病诊断等领域中

传统CBCT重建方法主要依赖滤波反投影法(FBP),严重受限于大量投影视角的数量。针对这一缺陷提出了解决方案——一些迭代优化类的方法被开发出来。尽管这些方法能在稀疏输入条件下能带来一定程度的重建效果提升,在时间效率和细节表现上存在明显不足。近年来随着深度学习的发展趋势,研究者开始探索将投影与CBCT图像映射纳入端到端学习体系中。然而这些方法直接将多视角信息进行整合却忽视了CBCT系统的几何特性从而导致结构性误差。与此同时在新视角合成与多视角重建方面取得了显著进展类似于其在CBCT重建中的作用但需对每次扫描进行单独优化耗时较长尤其当输入数据极度稀疏时效果欠佳

本研究聚焦于稀疏视角CBCT重建的技术难点:(1) 如何协调不同X射线投影的二维数据与三维重建之间的维度差异;(2) 如何应对极稀疏视角输入带来的信息缺失挑战。为此,我们创新性地提出了一个基于几何感知编码-解码框架的新方法。该框架整合了多视角一致性和深度学习的强大泛化能力:首先,在二维卷积神经网络(CNN)编码器的作用下,从各个X射线投影中提取多样化的多视角特征;随后将这些特征回投影至三维空间以弥合维度差异;接着引入自适应特征融合策略以聚合各向异性视图下的特征信息;最终通过三维CNN解码器将重建结果转化为完整的三维CBCT图像。

该框架凭借其卓越的几何感知能力,在多视角X射线投影中实现了可靠的特征提取,并充分借鉴了丰富多样的训练数据。即便仅提供5至10个不同视角的数据样本作为输入,在极端稀疏条件下也展现出良好的普适性。通过两项模拟实验以及一项真实临床数据集评估证明了本方法不仅具有有效性而且在时间效率方面表现突出。

Aastract

摘要

_Cone Beam Computed Tomography (CBCT)扮演了至关重要的角色,在临床成像领域发挥着不可替代的作用。传统的CBCT重建方法通常需要数百张二维X射线投影图像才能生成高质量的三维CBCT图像,这不仅带来了显著的辐射暴露风险,也促使研究人员对稀疏视图CBCT重建技术产生了浓厚兴趣,以期降低辐射剂量。尽管近年来基于深度学习与神经渲染算法取得了一系列进展,但现有方法要么无法获得理想效果,要么在个体优化过程中存在效率问题。本文提出了一种全新的几何关系感知编码器解码框架,旨在解决这一难题。该框架首先通过二维卷积神经网络编码器从多角度二维特征中提取信息,随后利用CBCT扫描过程中的几何特性将这些二维特征逆投影到三维空间中,从而构建一个全面的体态特征图,最后通过三维卷积神经网络解码器恢复出三维CBCT图像。值得注意的是,在逆投影阶段,我们的方法尊重了三维CBCT图像与其二维X射线投影之间的几何关系,同时充分利用了数据群体学习获得的知识储备以实现无需个体训练即可应对仅有5至10张X射线投影输入的情况下的适应性处理能力。通过在两个模拟数据集和一个真实数据集上的广泛评估实验验证了我们方法重建质量卓越且效率高的特点

锥形束计算机断层扫描(CBCT)在临床成像中发挥着重要作用。传统的二维X射线投影构建三维CBCT图像通常需要数百张,并由此导致较大的辐射暴露风险。这也促使研究者关注于稀疏视角下的CBCT重建技术以降低辐射剂量。尽管近年来基于深度学习与神经渲染算法的研究取得了进展[1-3],但仍存在 rebuilding效果欠佳及个体优化时间过长等问题限制了其应用范围。本文提出了一种新型的几何感知编码器-解码器框架以解决这一挑战问题:该框架首先利用多视角二维特征通过二维卷积神经网络(CNN)进行编码;随后借助CBCT扫描体内的几何结构将多视角特征回投影至三维空间生成综合体积特征图;最后通过三维CNN解码器恢复出完整的三维CBCT图像。值得注意的是,在回投影过程中我们严格遵循了CBCT图像与其对应的二维X射线投影间的几何关系并充分借鉴数据总体中的先验知识信息以确保框架在仅有5至10张X射线投影等极端稀疏视角输入情况下的适应性无需额外训练过程即可实现目标效果基于两个模拟数据集与真实数据集的大规模实验验证了所提方法在重建质量与时间效率方面的卓越性能

Method

方法

Figure 3 illustrates the architecture of our proposed method. Given a set of sparse-view X-ray projection data {Pi}{i=1}^{N}, our goal is to reconstruct the predicted CBCT image V_pred ∈ ℝ^{1×W×H×D}, which should be as accurate as possible compared to the ground truth image V_gt. The process begins by extracting features using a shared 2D CNN encoder, which serves as a common framework for all input projections. Subsequently, we integrate a feature backprojection module with an adaptive feature fusion process to construct the 3D feature map. This step is particularly challenging due to bridging the dimensional gap between 2D projections and 3D volumetric data. To address this complexity, we employ a 3D CNN decoder that refines the reconstructed image through iterative learning-based denoising and feature extraction. A key innovation in our approach lies in its ability to leverage deep learning-based prior knowledge, enabling effective reconstruction even from very few views (e.g., 5 or 10). This capability allows our model to function effectively without requiring extensive patient-specific training data. Additionally, we introduce a volume-wise CNN decoder that acts as a learnable filter, enhancing image quality by reducing noise and extracting more robust features. This component is especially valuable for capturing global structural information and mitigating streak artifacts commonly associated with sparse-view reconstruction. By combining these components seamlessly, our method achieves superior reconstruction accuracy and stability across diverse clinical scenarios.

图 3 描述了我们提出方法的设计过程。基于一组稀疏视角的 X 射线投影数据集 {Pi} ,我们的目标是使重建得到的 CBCT 图像 Vpred ∈ R1×W×H×D 尽可能贴近真实图像 Vgt 。首先,我们利用共享的二维卷积神经网络(CNN)架构提取特征。随后,在结合特征反投影模块与自适应特征融合机制的基础上构建三维(3D)特征表示。在此基础上,三维 CNN 解码器被用于处理目标 CBCT 图像恢复的过程。值得注意的是,在这一过程中,“特征反投影”这一技术环节发挥着关键作用:它有效地弥补了二维 X 射线投影与三维 CBCT 图像之间的维度差异问题。“这一关键步骤对于实现解剖学高度精确的重建具有重要意义。”此外,在面对有限采样的稀疏视角输入数据时,“深度学习预训练知识”的应用显得尤为重要:特别是当输入仅包含5个或10个采样角时。“这种特性使得我们的模型能够在低采样率条件下仍能获得令人满意的重建效果。”进一步地,在这一过程中,“体积级 CNN 解码器的作用被赋予了一种可学习滤波器的身份:它不仅有助于减少噪声污染,并且能够提取出更具鲁棒性的高级特征表示。”这种特性有助于模型更好地捕捉目标 CBCT 图像的整体结构信息,并有效抑制条纹伪影现象的发生。“在后续章节中我们将对这些技术细节进行深入阐述。”

Conclusion

结论

In this paper, we presented a new framework for reconstructing sparseview CBCT images. Our method incorporates the natural properties of X-ray projections during feature backprojection to ensure accurate data retrieval from multiple views. Furthermore, by prioritizing prior knowledge gained from an extensive dataset, our framework effectively addresses challenges in sparse-view scenarios while achieving superior reconstruction quality. Both the effectiveness and time efficiency of our approach are rigorously assessed through comprehensive testing on simulated and real-world datasets.

本文提出了一种新颖的框架来解决稀疏视角CBCT重建问题。该方法在特征回投影过程中充分考虑了X射线透视投影的独特属性,在多个X射线投影中准确提取关键信息。通过遵循X射线透视投影的固有特性,并借助从大规模数据集学习获得的先验知识,该框架成功地解决了稀疏视角输入带来的挑战,并实现了高质量的重建效果。通过对模拟数据集和真实数据集进行广泛测试后验证了其有效性及其高效的重建性能。

Results

结果

parison of 3D CBCT images reconstructed in axial slices fromcase #10 of dental dataset. It is evident that the FDK approach struggles with sparse-view input, leading to significantstreaky artifacts because of the limited number of input views.Although SART reduces these artifacts, it often loses finedetails in the process. When it comes to neural renderingbased methods, NAF achieves decent results with 20 viewsby incorporating neural rendering with hash encoding. Yet,its performance greatly diminishes with very few input views(such as 5 or 10), as it is optimized for individual objectsand lacks prior knowledge learned from the data population.SCOPE3D shows similar performance, as the re-projectionstrategy offers negligible new information. SNAF demonsratesimprovements due to its view augmentation strategy but stillstruggles with 10 or 5 views. PatRecon ignores geometricrelationships among multi-view projections, which results inblurry reconstructions with erroneous structures. Benefitingfrom CNN layers, PixelNeRF enjoys prior knowledge andmaintains multi-view consistency. But it tends to producenoticeable streaky artifacts due to its point-wise MLP decodingand 2D supervision. DIF-Net builds upon the principles ofPixelNeRF, achieving better results due to its 3D supervision.The results with 20 views input are comparable to ours,with slight blurriness and noise as highlighted in the orangebox. However, its performance degrades with sparser inputs,such as 5 views, exhibiting streaky artifacts due to its pointwise MLP decoding approach. This is because the point-wiseMLP independently decodes the attenuation of each querypoint, disregarding the spatial relationships among neighboringvoxel points in CBCT image. MLP decoder is also unable tocapture the global structure information of CBCT image withthe point-wise 3D supervision. As a result, it would deliverstreaky artifacts, especially when facing extremely sparse inputlike 5 views. In contrast, our CNN-based decoding moduleconsiders interactions among neighboring points, effectivelyacting as a learnable filter to mitigate noise and extract morerobust feature representations. Moreover, 3D CNN decoder iscapable of capturing the global structure information with thevolume-wise 3D supervision. Consequently, our reconstructedCBCT images exhibit higher quality with less streaky artifacts.Notably, our approach surpasses all other methods, providingreconstruction quality comparable to the ground truth with 20input views. However, recovering details with high fidelitybecomes challenging for our method when facing 10 or 5views. Despite this limitation, our method still maintains aclear advantage over the competition, showing less streakyartifacts and preserving a better global structure.

图 5 展示了来自牙科数据集第 10 号病例的 3D CBCT 图像在轴向切片中的并排比较。显然,FDK 方法在稀疏视角输入下效果较差,因输入视角数量有限导致显著的条纹伪影。虽然 SART 方法能够减少这些伪影,但在处理过程中常常会丢失细节。对于基于神经渲染的方法,NAF 在 20 个视角的情况下通过将神经渲染与哈希编码结合,达到了较好的效果。然而,当输入视角非常少(例如 5 或 10 个)时,其性能急剧下降,因为它是针对单个物体优化的,且缺乏从数据集群体中学习的先验知识。SCOPE3D 表现相似,因为其重新投影策略带来的新信息微乎其微。SNAF 通过视角增强策略有所改善,但在 10 个或 5 个视角时仍然表现不佳。PatRecon 忽视了多视角投影之间的几何关系,导致重建图像模糊,结构错误。借助 CNN 层,PixelNeRF 利用了先验知识,保持了多视角一致性,但由于其点式 MLP 解码和 2D 监督,它会产生明显的条纹伪影。DIF-Net 基于 PixelNeRF 的原理进行改进,得到了更好的结果,因为它具有 3D 监督。20 个视角输入下的结果与我们的结果相当,稍微有一些模糊和噪声(如橙色框中所示)。然而,当输入视角更稀疏(如 5 个视角)时,其性能下降,出现条纹伪影,原因在于其点式 MLP 解码方法。点式 MLP 独立解码每个查询点的衰减,忽略了 CBCT 图像中相邻体素点之间的空间关系。MLP 解码器也无法捕捉 CBCT 图像的全局结构信息,因此会产生条纹伪影,尤其是在输入视角极其稀疏(如 5 个视角)时。相比之下,我们基于 CNN 的解码模块考虑了相邻点之间的相互作用,能够有效地作为可学习的滤波器来减轻噪声,并提取更强健的特征表示。此外,3D CNN 解码器能够通过体积级 3D 监督捕捉全局结构信息。因此,我们重建的 CBCT 图像具有更高的质量,条纹伪影较少。值得注意的是,我们的方法在所有其他方法中表现最佳,且在 20 个输入视角下的重建质量与真实图像相当。然而,当面对 10 或 5 个视角时,我们的方法在恢复高保真细节时面临挑战。尽管如此,我们的方法仍然保持了相对于其他方法的明显优势,条纹伪影较少,并且保留了更好的全局结构。

Figure

图片

Fig. 1. CBCT imaging and image reconstruction. Within the CBCT imaging process, CBCT imaging (a) would produce a series of 2D X-ray images(b). These images are then used for forming a three-dimensional CBCT image (c).

图1:CBCT扫描与重建。在完成CBCT成像的过程中,在完成扫描操作后会生成一系列二维X射线投影数据块(b)。这些数据块被用来构建完整的三维重建模型(c)。

图片

Fig. 2. Geometric arrangement of CBCT imaging and X-ray projection simulation.

图2:CBCT扫描和X射线投影模拟的几何配置。

图片

The proposed method involves three main stages. First, the 2D CNN encoder is responsible for extracting feature representations from multi-view X-ray projections. Second, the system constructs a comprehensive 3D feature map through techniques such as feature back projection and adaptive feature fusion. Finally, the generation of the final CBCT image is accomplished by feeding the 3D feature map into a 3D CNN decoder.

如图3所示:我们的方法概述如下。首先,在多视角X射线投影数据中利用二维卷积神经网络(CNN)编码器提取相应的特征表示。接着,在这一过程中,通过结合特征回投射技术和自适应特征融合方法构建完整的三维特征图。最后将构建得到的三维特征图输入到三维卷积神经网络解码器中进行处理,并最终生成所需的CT图像数据。

图片

Fig. 4. The coordinate conversion of the query location is utilized in the feature backprojection.

图4. 特征后投影查询点的坐标变换。

图片

Fig. 5. A qualitative assessment of case #10 within the dental dataset (a section taken along the axial plane). Window settings: [-1, 2] thousand Hounsfield units (HU).

图5:牙科数据集案例#10的定性比较(轴向切片)。窗宽:[-1000, 2000] HU。

图片

Fig. 6. Qualitative comparison with SNAF and DIF-Net on case #9 from dental dataset. From top to bottom: axial, coronal, and sagittal slices.Window: [-1000, 2000] HU.

图6:在牙科数据集的病例#9上与SNAF及DIF-Net进行定性对比。从上至下依次为轴向切片、冠状切片和矢状切片。窗口宽度设置为-1000至2

图片

Fig. 7. Qualitative assessment on case 16 of the spine dataset (sagittal view). Window settings: [-1000, 1000] HU.

图7:脊柱数据集案例#16的定性比较(矢状切片)。窗宽:[-1000, 1000] HU。

图片

Figure 8 presents a qualitative assessment of case 1 from the walnut dataset, specifically examining its axial section. The imaging window was set to [-1000, 2000] Hounsfield units.

图8:核桃数据集案例#1的定性比较(轴向切片)。窗宽:[-1000, 2000] HU。

图片

Fig. 9. Visual assessment of sample ID #16 from the spine dataset using DDS3D compared to a baseline method. Starting from the top, the slices are arranged in three orientations: axial (front-back), coronal (left-right), and sagittal (vertical). Window settings [- × e3 to + × e3HU].

图表9展示了脊柱数据集案例#16与DDS3D的对比分析。依次为轴向切片、冠状切片以及矢状切片。窗口宽度设定为-1 HU

图片

Fig. 10. Qualitative results of ablation experiment on feature fusion strategy for case #1 in the dental dataset (coronal section). Window settings: [-1000, 2000] HU.

图10:牙科数据集案例#1上关于特征融合策略的消融研究的定性结果(冠状切片)。窗宽:[-1000, 2000] HU。

图片

Empirical evidence from ablation experiments demonstrates the effectiveness of the proposed method in handling complex cases. The results are presented in a qualitative manner to showcase the robustness of the system across diverse scenarios. The analysis was conducted on case #6 from the dental dataset, focusing specifically on the coronal section. The window range for visualization is [-1000, 2000] Hounsfield units.

图表编号为图11的是牙科数据集案例#6中关于损失项消融效果分析的部分(冠状切片)。其窗口宽度设定为从-1千到2千HU。

图片

Fig. 12 presents the empirical analysis of an ablation study conducted to evaluate the performance under varying gradient loss weights (L_{\text{gradient}}) and projection loss weights (L_{\text{projection}}), specifically focusing on a dental dataset.

图12展示了针对牙科领域中两种不同的损失权重参数进行消融实验的定量分析结果

图片

Fig. 13: Model robustness to noisy data.

图13:模型对噪声数据的鲁棒性。

图片

Fig. 14展示了对角度采样的鲁棒性分析的可视化结果。(来自口腔CT数据集的第10号病例)窗口范围为[-1000, 2000] HU.

图像14:口腔医学数据集实例#1O中关于角度采样的鲁棒性分析的定性结果(AX线切片)。其中窗口宽度为[-1Oo, 2OOO]HU。

图片

Fig. 15. Qualitative results of robustness analysis on input views for case #1 from the dental dataset (axial slice). The window range is [-15, 25]HU.

编号为图15的案例中,在对输入视角数量进行鲁棒性评估的基础上进行了定性分析结果展示(AX线切片)。窗宽设定为[-1000, 2000] HU。

Table

图片

_TABLE I: A quantitative comparison on a dental dataset, with the best performance highlighted as highlighted.

表1:牙科数据集上的定量比较。最佳性能以粗体显示。

图片

TABLE II provides a quantitative assessment of the spine dataset. The best performance is highlighted with bold text.

表2:脊柱数据集上的定量比较。最佳性能以粗体显示。

图片

Table III presents a quantitative assessment of the walnut dataset, highlighting its optimal performance through bold formatting.

表3:核桃数据集上的定量比较。最佳性能以粗体显示。

图片

TABLE IV efficiency assessment. the top performer is highlighted in bold. unit: time duration measured in seconds; memory consumption measured in gigabytes: gpu memory consumption during training; model size measured in megabytes.

表格4用于展示效率分析的结果。最优性能被用加粗字体突出显示以增强视觉效果。计量单位包括时间(秒)、内存(GB)以及GPU内存消耗情况,并标注了大小(MB)。

图片

TABLE V quantitative metrics of systematic evaluation on feature integration approach in dental dataset.

表5:牙科数据集上关于特征融合策略的消融研究的定量结果。

图片

TABLE VI: detailed results of the ablation study conducted on the loss term within the dental dataset.

表6:牙科数据集上关于损失项的消融研究的定量结果。

图片

TABLE VII ablation analysis on downsampling ratio. Units: Time in seconds, Size in megabytes, Memory in gigabytes: GPU memory consumption during training.

表格7展示了不同下采样率S对模型性能的影响消融研究结果。其中涉及的关键指标包括训练耗时(单位:秒)、数据存储规模(单位:MB)、以及系统资源占用情况(单位:GB)。具体而言,在训练过程中需要关注模型在GPU上所占用的内存资源量。

图片

TABLE VIII q reconstruction resolution on dental dataset quantitative results of robustness analysis on.

表8:牙科数据集上重建分辨率的鲁棒性分析的定量结果。

图片

TABLE IX quantitative assessment of robustness evaluation conducted on angle-based sampling within the dental dataset.

表9:牙科数据集上关于角度采样的鲁棒性分析的定量结果。

图片

TABLE X presents the quantitative results of a robustness analysis focusing on the number of input views within the dental dataset.

表10:牙科数据集上关于输入视角数量的鲁棒性分析的定量结果。

全部评论 (0)

还没有任何评论哟~