Advertisement

周读论文系列笔记(2)-reivew-A survey on Deep Learning in Medical Image Analysis

阅读量:

初入这一领域……不擅长撰写……存在一些翻译错误以及对内容的理解偏差,请各位大神恳请大家给予指导与批评意见。

本文共分为四个章节:
其中在医学图像处理中应用尤为突出…
第一部分已不再赘述…
其应用领域…
面临的挑战与未来展望…

原文链接:https://www.sciencedirect.com/science/article/pii/S1361841517301135

文章目录

    • 1.深度学习方法
  • 医疗图像中的深度学习应用

    • 2.1 分类技术
      • 2.1.1 图像或检查分类 (图像或检查分类)
    • 2.1.2 物体或病变分类 (物体或病变分类)
  • 2.2 检测
    * 2.2.1 器官定位、区域识别及关键点标定 (器官定位)

    • 2.2.2 物体或病变检测 (物体或病变检测)
  • 2.3 图像分割

      • 2.3.1 器官及亚结构的分割
  • 2.3.2 病灶的分割

  • 2.4 配准

  • 2.5 其他医学影像相关的任务

    • 2.5.1 基于内容的图像检索(Content-based Image Retrieval, CBIR)
    • 2.5.2 图像生成与增强(Image Generation and Enhancement, IGE)
    • 将图像数据与报告相结合(Integrated Image Data and Report Analysis, IDR)
  • 3.Application domains

    • 3.1 Brain region
    • 3.2 Ocular region
    • 3.3 Thoracic region
    • 3.4 Digital pathology and microscopy techniques
    • 3.5 Breast region
    • 3.6 Cardiac region
    • 3.7 Abdominal region
    • 3.8 Musculoskeletal system
    • 3.9 Other category
  • Diffusion

    • 概述
    • 成功的深度学习方法的关键特征
    • 医疗图像分析的独特挑战
    • 展望与未来趋势
在这里插入图片描述

1.Deep learning methods

2.Deep learning uses in medical imaging

2.1 Classification 分类

2.1.1 Image/exam classification (图像/exam 分类)

Imaging modality or diagnostic imaging was among the earliest fields where deep learning has demonstrated significant impact within medical imaging.

Within the context of exam classification, one commonly encounters a number of images (referred to as an exam instance) serving as the input, while the system outputs a single diagnostic variable, such as the presence or absence of a disease.

Dataset sizes are small -> transfer learning

Two primary transfer learning approaches have been identified:
(1) Pre-trained networks are utilized as feature extractors.
(2) Pre-trained networks undergo fine-tuning when applied to medical datasets.
The former approach offers an additional advantage by eliminating the need for training deep networks, enabling extracted features to seamlessly integrate into existing image analysis pipelines. These methods have garnered significant attention and widespread application across various domains. Among researchers, there is limited exploration into determining which strategy yields superior results.

Methods:
(1) Primarily focus on unsupervised pre-training techniques and network architectures such as Sparse Autoencoders (SAEs) and Restricted Boltzmann Machines (RBMs).
(2) Convolutional Neural Networks (CNNs) have been increasingly utilized in diverse application domains spanning brain MRI, retinal imaging, digital pathology, and lung computed tomography.
(3) In more recent studies, researchers typically employ their own network architectures from scratch rather than relying on pre-trained networks.
(4) Three research papers have developed innovative architectures specifically tailored for handling the unique characteristics of medical data, incorporating advanced 3D modeling capabilities.

在医学分类任务中,卷积神经网络(CNNs)已成为当前的标准技术。尤其是那些预先训练于自然图像上的CNN显示出出人意料的强大性能,在某些任务中甚至超过了人类专家的水平。最后,研究者们已经证明了CNN能够适应性地利用医学图像的内在结构。

2.1.2 Object or lesion classification (object或病变分类)

Object classification typically involves categorizing a portion that has been previously designated within a medical image into specific categories, such as nodular classification in chest CT.

For localized lesion appearances and global-level lesion locations, accurate classification typically exhibits limitations when applied to generic deep learning architectures.

Methods:
(1) 大多数近期的研究倾向于采用端到端训练的卷积神经网络(CNN)。
(2) 多篇文献中已有学者通过多尺度架构来解决此问题。
(3) 三个卷积神经网络(每个都接收一个结节切片),结合卷积神经网络与循环神经网络(用于核白内障分级),三维卷积神经网络(用于高级别胶质瘤)。

In certain scenarios, alternative architectures and methodologies are employed, including RBMs (Restricted Boltzmann Machines), sparse autoencoders (SAEs), and convolutional sparse auto-encoders (CSAEs). The primary distinction between CSAEs and conventional CNNs lies in the application of unsupervised pre-training using sparse auto-encoders.

一种有趣的方法,在特别的情况下(即当生成训练数据需要依赖对象注释时),是将多实例学习(MIL)与深度学习进行融合。

While object classification typically employs fewer pre-trained networks than exam-based classifications, this is primarily because the latter necessitates the integration of contextual or 3D data.

2.2 Detection 检测

2.2.1 Organ, region and landmark localization (器官 定位)

The localization of anatomical structures, whether spatially or temporally, is a critical preprocessing stage in segmentation tasks or within the clinical context of therapy planning and intervention. In medical imaging, localizing 3D volumes is often a computationally intensive task requiring careful parsing to ensure accuracy.

Methods:
Space:
(1)Variants have been developed to address the challenge of 3D data parsing using deep learning algorithms, treating the 3D space as a combination of 2D orthogonal planes.
(2)Other authors seek to adjust the network learning process by aiming to directly predict locations; however, due to its increased complexity, only a limited number of techniques address the direct localization of landmarks and regions in the 3D image space.
Time:
(1)Convolutional Neural Networks (CNNs) have been employed for identifying scan planes or key frames in temporal data.
(2)Recurrent Neural Networks (RNN), especially Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNS), have been utilized to analyze temporal information within medical videos; these are another form of high-dimensional data.
(3)A combination of LSTM-RNN and CNN was implemented.

Summary:
The predominant approach for organ, region, and landmark detection commonly employs 2D image classification coupled with CNNs and achieves satisfactory outcomes.
While existing frameworks are being built upon by modifying the learning process to prioritize accurate localization outcomes.
We anticipate further exploration of these strategies will reveal their versatility in addressing diverse localization tasks (e.g., multiple landmarks).
Additionally, RNNs have demonstrated significant potential in temporal domain applications, while multidimensional RNNs could also prove valuable for spatial positioning tasks.

2.2.2 Object or lesion detection (object或病变检测)

The identification of objects of interest or lesions within images serves as a critical component in diagnostic processes and represents a highly labor-intensive task for clinicians. Typically, such tasks involve both the localization and classification of small lesions within the full image space.

Methodology: Long-term research efforts have established that computer-aided detection systems are designed to automatically identify abnormalities. The first object detection system employing CNNs was introduced in 1995, utilizing a four-layer CNN architecture to detect nodules within X-ray images.

(2)Most published deep learning-based object detection systems primarily employ CNNs for pixel (or voxel) classification, followed by post-processing to identify object candidates. As the fundamental classification task conducted at each pixel constitutes an object recognition process, the underlying CNN architecture and methodologies are highly analogous to those employed in object detection systems. Furthermore, the incorporation of contextual or 3D information: multi-stream CNNs

Difference between object detection and object classification: A key difference lies in how class balance becomes imbalanced during training when every pixel is classified. A common issue arises when non-object samples are easily discriminable. A form of convolutional neural network (fCNNs) processes pixels within a sliding window, leading to an excessive amount of redundant computations.

Summary:
类似于物体分类任务中的挑战。
目前很少有文献专门针对物体检测领域的特定问题进行了深入探讨。这类问题包括类别不平衡/难负样本挖掘以及图像像素/体素级别的高效处理等问题。
我们有理由相信,在不久的将来,这些领域将获得更多的关注焦点。例如,在全卷积神经网络中采用多支网络架构的应用方向就是一个值得探索的方向。

2.3 Segmentation 分割

2.3.1 Organ and substructure segmentation (器官和子结构分割)

The organ segmentation and other substructure delineation in medical imaging enable quantitative assessment of volumetric and morphometric parameters such as in cardiac and brain analyses. Additionally, it is often a pivotal initial phase in computer-aided detection pipelines.

Segmentation tasks are commonly conceptualized as determining the collection of voxels comprising either the object's boundary or its internal structure.

Methods:
(1) Among the most famous within the field of medical image analysis, these novel CNN architectures include U-Net as the most well-known.
(2) Recurrent Neural Networks (RNNs) have gained increasing popularity for addressing segmentation challenges.
(3) Numerous researchers have achieved impressive segmentation outcomes utilizing patch-trained neural networks. In recent years, convolutional feature banks (fCNNs) have become the preferred choice over sliding-window-based classification methods, primarily to minimize redundant computations. Furthermore, fCNNs have been extended into three-dimensional structures and applied to concurrent multi-target segmentation tasks.

Despite challenges, voxel-based classification methods often encounter issues that result in incorrect or unintended outputs. Researchers have attempted to address this issue by integrating fCNN architectures with graphical models such as Markov Random Fields (MRF) and Conditional Random Fields (CRF), aiming to enhance segmentation accuracy. The majority of studies often employ these techniques by processing data from CNN-based likelihood maps, where MRF and CRF serve as constraints for label assignments.

Recent advancements in medical imaging have witnessed a significant surge in deep learning-based approaches. Customized architectures specifically designed for the segmentation task have been developed over time. These methods have demonstrated impressive outcomes, often surpassing or matching the performance achieved by fCNNs.

2.3.2 Lesion segmentation (病变分割)

Segmentation of lesions involves tackling the challenges posed by object recognition and organ-level and suborgan-level segmentation within deep learning applications. (1)Global and local context are essential for accurate segmentation, often achieved through multi-stream networks employing patches sampled at varying scales or non-uniformly. (2)In lesion segmentation, a common approach is using U-nets for lesion segmentation to effectively utilize both global and local context.

Challenge lies in class imbalance, which poses significant hurdles in machine learning tasks. The solutions proposed include:(1) modifying the loss function to address class imbalance by assigning higher weights to underrepresented classes;(2) employing data augmentation techniques to enhance the representation of minority classes through synthetic sample generation.

Summary:
Lesion segmentation integrates various methodologies from object detection and organ-level segmentation. The advancements in these domains are likely to naturally evolve as the current challenges remain largely consistent.

2.4 Registration 配准

Image Registration, which refers to the process of spatial realignment, of medical images, serves as a fundamental task in image analysis. Typically, this involves the calculation of coordinate transforms between pairs of images within an iterative framework. Such frameworks assume specific types of transformations—whether parametric or non-parametric—and optimize predetermined metrics to achieve the desired alignment.

Methods:
Researchers have found that deep networks can be beneficial in getting the best possible registration performance.
Broadly speaking, two strategies are prevalent in current literature:
(1) using deep-learning networks to estimate a similarity measure(相似性度量) for two images to drive an iterative optimization strategy(迭代优化策略).
(2) to directly predict transformation parameters using deep regression networks.

Summary:
尽管分类和分割技术已被广泛使用,在注册方法中整合深度学习技术仍面临诸多挑战。目前关于该主题的研究文献数量有限,并且现有研究各自采用了不同的解决方案。

2.5 Other tasks in medical imaging

2.5.1 Content-based image retrieval (基于内容的图像检索)

Content-based image retrieval (CBIR) represents a method for extracting meaningful knowledge from vast data repositories. It provides the potential to locate comparable case files and assess uncommon medical conditions, ultimately enhancing patient treatment outcomes.

The primary difficulty in developing CBIR methods lies in extracting and representing pixel-level data in a manner that effectively captures the underlying concept structures and relating these representations to meaningful conceptual semantics.

Methods:
All existing methods rely on pre-trained CNNs to extract feature descriptors from medical images.

Summary:
However, despite these findings, content-based image retrieval as a whole has not yet achieved significant progress from deep learning methods. It remains to be seen whether this will change in the near future.
A promising research track involves directly utilizing deep networks for retrieval tasks.

2.5.2 Image generation and enhancement (图像生成和增强)

多种基于深度架构的图像生成与增强方法已被提出,并涵盖从去除图像中的障碍元素到对图像进行归一化处理、提升图像质量以及数据填补等各项技术手段,并最终实现模式发现的目的。

Methods:
In the domain of image generation, 2D and 3D convolutional neural networks (CNNs) are utilized to transform a single input image into another. Typically, these architectures do not include pooling layers that are commonly found in classification networks.
Multi-stream CNNs allow for the generation of high-resolution images from multiple low-resolution inputs.

Summary:
Image generation has achieved remarkable accomplishments through the use of deep networks in highly innovative applications across a wide range of distinct tasks.

2.5.3 Combining image data with reports (将图像数据与报告结合)

Combining textual reports with medical imaging data has spawned two primary research directions:
(1) utilizing reports to boost image classification precision,
(2) creating text reports from image data.
The latter approach draws inspiration from recent caption generation papers on natural images.

The abundance of data accessible within PACS systems, particularly regarding images and their corresponding diagnostic reports, suggests a promising direction for future deep learning studies. It is reasonable to anticipate that advancements in image description will eventually be implemented across these datasets.

3.Application areas

We emphasize major contributions and address the performance of systems across large-scale datasets and public challenge datasets. All the complex issues are detailed on the website http://www.grand-challenge.org.

3.1 Brain 脑

深度神经网络在多个不同的应用领域中已被广泛应用于脑图像分析

A significant number of studies investigate the classification of Alzheimer’s disease(阿兹海默病的分类) and the segmentation of brain tissue as well as anatomical structures such as the hippocampus(脑组织和如海马体等解剖结构的分割)。 Other key areas include the detection and segmentation of various lesions including tumors, white matter lesions, lacunes, and micro-bleeds(病变包括肿瘤、白质病变、腔隙以及微出血的检测与分割)。

Apart from approaches that focus on global-level categorization (e.g., Alzheimer's disease diagnosis), most methods learn mappings from local patches to representations and subsequently from representations to labels. However, these local patches may be insufficient for tasks requiring detailed anatomical understanding.

addressing this challenge, Ghafoorian et al. (2016b) employed varying density patches by gradually reducing the sampling rate on patch sides to encompass a broader context. Another approach taken by various research groups is the application of multiscale techniques and the integration of representations within a fully-connected layer.

[Methods]
Even though brain images are 3D volumes in all surveyed studies, most methods work in 2D, analyzing the 3D volumes slice-by-slice. This is often motivated by either the reduced computational requirements or the thick slices relative to in-plane resolution in some data sets. More recent publications had also employed 3D networks.

[Summary]
BRATS
LSLES
MRBrains

the top ranking teams to date have all used CNNs.

Majority of the aforementioned methods are focused on brain MR images. We anticipate that other imaging modalities, including CT and US, will also stand to benefit from deep learning-based analysis.

3.2 Eye 眼睛

Ophtahlmic imaging(眼科成像)

Many studies utilize basic convolutional neural networks for the investigation of color fundus imaging (CFI).

多种多样的应用被涉及或处理:在解剖学中结构的分割(segmentation\ of\ anatomical\ structures),结合视网膜异常的识别与分析(segmentation\ and\ detection\ of\ retinal\ abnormalities),辅助眼科疾病的诊断判断(diagnosis\ of\ eye\ diseases),以及图像质量评估分析(image\ quality\ assessment)。

Kaggle hosts a diabetes retinopathy detection competition dedicated to advancing medical research. Over 35,000 color fundus images (CFI) have been collected for training algorithms that analyze and predict the severity of disease in 53,000 test images. The majority of competing teams utilize end-to-end convolutional neural networks (CNNs).

3.3 Chest 胸部

In both radiography (X-ray) and computed tomography (CT) imaging techniques within thoracic image analysis (胸部图像分析), the identification, classification, and differentiation of nodules (结节) represent one of the most frequently tackled problems in this field.

In chest X-ray imaging, teams utilize a single system to detect multiple diseases. In CT scans, the identification of textural features related to interstitial lung diseases is also a well-researched topic.

The LUNA16 challenge presents a significant test for computer-aided lesion identification in CT scans. While CNN architectures have been successfully employed by all top-performing systems in this competition, other approaches have also demonstrated impressive performance. Notably, some systems rely on candidate lesions identified through rule-based image analysis, while others have achieved excellent results using deep network-based candidate detection methods.

Kaggle Data Science Bowl 2017: Estimation of the probability that a individual has lung cancer from a CT scan

3.4 Digital pathology and microscopy 数字病理学和显微镜

3.5 Breast 乳房

3.6 Cardiac 心脏

Deep learning techniques have been extensively employed across various domains related to cardiac image analysis.

[Domains]
MRI is one of the most studied imaging techniques with left ventricle segmentation being a frequently performed procedure.
Various application domains include segmentation、tracking、slice classification、image quality assessment、automatic calcium scoring以及coronary centerline tracking.

Methods
(1)大部分论文采用了简单的二维卷积神经网络,并对三维数据(有时为四维)进行了分片分析。
(2)唯一例外的是Wolterink等人(2016年)使用了三维卷积神经网络。
(3)DBNs(深度信念网络)在四篇论文中被使用,但所有这些研究均来自同一作者团队。这些DBNs仅用于特征提取,并整合到复合分割框架中。
(4)两篇论文将卷积神经网络与循环神经网络相结合。

[Challenge]
Kaggle Data Science Bowl2015: automatically measure end-systolic and end-diastolic volumes(心脏收缩末期和心脏舒张末期的容量) in cardiac MRI.

3.7 Abdomen 腹部

3.8 Musculoskeletal 肌与骨骼的

3.9 Other

4.Dissusion

4.1 Overview

(1) Early investigations employed pre-trained CNNs as feature extraction tools.
(2) In recent years, fully trained CNNs have increasingly become the mainstay of medical imaging interpretation, establishing the current standard practice.

4.2 Key aspects of successful deep learning methods

Despite CNN and its derivatives being ubiquitously recognized as outstanding performers in commonly recognized fields of medical image analysis, their precise structure is not the most critical factor in achieving an effective solution.

(1)Expert knowledge about the task to be solved can provide advantages that go beyond adding more layers to a CNN.(e.g. novel data preprocessing or augmentation techniques.)
(2)Designing architectures incorporating unique task-specific properties can obtain better results than straightforward CNNs. (e.g. multi-view and multi-scale networks).
Other, parts of network design are the network input size and receptive field(网络输入大小和接收场) (i.e. the area in input space that contributes to a single output unit (在输入空间中有助于单个输出单元的区域)).
(3)Model hyper-parameter optimization (e.g. learning rate, dropout rate)(a highly empirical exercise)(secondary importance with respect to performance to the previously discussed topics and training data quality.)
solutions:intuition-based random search(基于直觉的随机搜索)(work well enough), Bayesian methods for hyper-parameter optimization(not been applied in medical image analysis)

4.3 Unique challenges in medical image analysis

(1) 缺少大量训练数据集通常被视为一个障碍。
主要挑战并非来自图像数据的可获得性本身,
而是获取这些图像上相关注释/标签的过程。

Turning reports into precise annotation targets or structured label outputs automatically necessitates the application of advanced text-mining techniques, which stands as a significant academic discipline, with deep learning emerging as its cornerstone today.

Solution

(2) Label noise (no consensus was enforced) Constructing a deep learning system using such data demands careful attention to handling noise and uncertainty within the reference standard. Solutions: Incorporating labeling uncertainty directly into the loss function is an open challenge.

In medical imaging, classification or segmentation is typically framed as a binary task: normal versus abnormal (normal vs. abnormal), or object versus background. However, this approach is a significant oversimplification since both classes can exhibit substantial heterogeneity. To enhance the system's capability, one could upgrade the deep learning framework to handle multiple classes by providing it with detailed annotations of all possible subclass categories (a solution that is not practical due to resource constraints); Alternatively, one could address the class imbalance by integrating intelligent mechanisms into the training process itself, such as employing stratified sampling or difficult example mining (but this approach may fail when there is substantial noise in the reference standard).

(4) 类别不平衡问题 在医学图像处理中,异常类别中的图像可能较为稀少,难以获取. 解决方案:通过应用特定的数据增强算法

(5) Physicians often leverage a wealth of data on patient history, age, demographics and others to arrive at better decisions. Some authors have already investigated combining this information into deep learning networks in a straightforward manner. The improvements that were obtained were not as large as expected.
One of the challenges is to balance the number of imaging features in the deep learning network (typically thousands) with the number of clinical features (typically only a handful) to prevent the clinical features from being drowned out.

4.4 Outlook

(1) Several high-profile successes of deep learning in medical imaging have been reported, such as the work by Esteva et al. (2017) and Gulshan et al. (2016) in the fields of dermatology(皮肤病学) and ophthalmology(眼科学).
However, ①both focus on small 2D color image classification; ②And it also allowed the authors to use networks that were pre-trained on a very well-labeled dataset.
In contrast, ①in most medical imaging tasks 3D gray-scale(3D灰度图) or multi-channel images(多通道图像) are used for which pre-trained networks or architectures don’t exist(不存在预训练的网络或架构); ②This data typically has very specific challenges, like anisotropic voxel sizes(各向异性体素尺寸), small registration errors between varying channels (e.g. in multi-parametric MRI) or varying intensity ranges(不同的通道或不同的强度范围的小配准误差); ③Although many tasks in medical image analysis can be postulated as a classification problem, this might not always be the optimal strategy as it typically requires some form of post-processing with non-deep learning methods

An important domain that can play a significant role in the field of medical imaging and attracts growing attention: unsupervised learning.

Unsupervised methods are commendable for ①enabling (starting) network training with an abundance of unlabeled data accessible worldwide, ②being analogous to human learning.

Unsupervised methods: ①Variational auto-encoders (VAEs) ②Generative adversarial networks (GANs). The first merges variational Bayesian graphical models with neural networks as encoder/decoders. The second method involves competing convolutional neural networks, where one is dedicated to generating artificial data samples while the other distinguishes between artificial and real ones. Both approaches incorporate stochastic elements and function as generative networks. All of these methods have stochastic components and are generative networks. Most importantly, they can be trained end-to-end to extract meaningful features without labeled data.

(3) Deep learning methods are commonly referred to as 'opaque systems.' It is frequently inadequate for an effective prediction system. Additionally, this system must be capable of expressing its own perspective or reasoning process in some manner.

为了解释卷积神经网络中间层对输入刺激的响应特性,已开发出多种分析方法。

We also anticipate that deep learning techniques will be employed for related imaging tasks in the field of medical imaging, which are largely unexplored. Such as image reconstruction (Wang, 2016).

全部评论 (0)

还没有任何评论哟~