Advertisement

卷积网络和卷积神经网络_卷积神经网络的眼病识别

阅读量:

卷积网络和卷积神经网络

关于这个项目 (About this project)

I recently enrolled in the Algorithms for Massive Data course offered by the University of Milan. As part of this program, I undertook a project aimed at creating a Deep Learning model capable of diagnosing eye diseases from retinal fundus images using TensorFlow as the computational framework. A key necessity was ensuring that the training process could scale effectively by developing a robust data pipeline capable of processing vast quantities of data points. This article provides an overview of my research on Convolutional Neural Networks and efficient data pipeline construction utilizing TensorFlow datasets. Entire code with reproducible experiments is available on my Github repository: https://github.com/GrzegorzMeller/AlgorithmsForMassiveData

该项目是由米兰大学组织的‘大数据算法’课程中的一项重要参与项目。 任务是利用TensorFlow库从眼底影像入手开展疾病识别工作,并将其纳入深度学习框架进行研究。 为了确保训练过程具备良好的扩展性需求,在本项目中构建了一个高效的数据传输管道。 在本文中,我探讨了卷积神经网络以及基于Tensorflow数据集对象实现高效数据传输管道的具体方法,并在此过程中积累了可重复实验的支持代码资源: https://github.com/GrzegorzMeller/AlgorithmsForMassiveData

介绍 (Introduction)

Early-stage ocular disease detection represents a cost-effective method for preventing blinding caused by diabetes, glaucoma, cataract formation, age-related macular degeneration (AMD), as well as numerous other conditions. Currently, over 2.2 billion individuals worldwide suffer from some form of visual impairment; unfortunately, approximately 1 billion cases could potentially be mitigated through early intervention[1]. The expeditious identification of diseases remains crucial in minimizing ophthalmologists' workloads while simultaneously safeguarding patient vision. The integration of computer vision techniques with deep learning algorithms enables automated detection of ocular pathologies once high-quality medical retinal imaging is available. In this study, I present various experiments aimed at developing a robust diagnostic tool utilizing convolutional neural networks implemented with TensorFlow.

早期眼科疾病的早期筛查是一种在糖尿病、青光眼、白内障以及年龄相关黄斑病变(AMD)等多种疾病中预防失明的经济高效的方法。根据世界卫生组织(WHO)最新数据表明,在全球范围内约有22亿人存在视力障碍问题,并且其中约有10亿人可以通过预防措施来避免这一状况的发生[1]。加快疾病检测速度对于减轻眼科医生的工作负担并保护患者视力具有重要意义。提供高质量的医学眼底图像后,计算机视觉和深度学习技术能够实现对眼部疾病的自动化检测

数据集 (Dataset)

Ocular Disease Intelligent Recognition(ODIR)is a structured ophthalmic database containing information on 5000 patientsincluding their age color-coded retinal photographs taken from both eyes left and right as well as diagnostic keywords provided by doctors. This dataset aims to reflect the real-life patient information gathered by Shanggong Medical Technology Co., Ltd. across various hospitals medical centers in China. The imaging process involves capturing fundus images using a variety of commercially available cameras like the Canon EOS series Zeiss Capture One and the Kowa iDRx4 Pro Series cameras which result in differing image resolutions. Annotations were meticulously reviewed by qualified human readers following rigorous quality control measures[2]. The patients are classified into eight categories: normal eyes(N) those with diabetes(D) glaucoma suspects(G) cataract patients(C) individuals diagnosed with AMD(A) those with hypertension-related issues(H) people experiencing myopia(M) and others classified as having other conditions or abnormalities(O).

眼病智能识别系统(ODIR)是一个经过系统化处理的眼科知识库,在全球范围内进行了大量眼科病例的研究与积累。该数据库包含了来自5,000名患者的详细信息,并对每例病例进行了标准化处理:详细记录了左眼和右眼的眼底图像,并通过标准化流程提取了眼科医生的专业诊断关键词。研究团队表示:该数据库旨在反映由上工医疗技术有限公司从中国各地多家医院/医疗中心获取的真实患者数据样本集合,并采用了多种高端显微镜设备(如佳能、蔡司及Kowa等)进行图像采集工作以确保高质量输出。研究者们对这些患者进行了八项分类标记:包括正常(N)、糖尿病(D)、青光眼(G)、白内障(C)、AMD(A)、高血压(H)、近视(M)以及其他类别(O),并对其临床症状进行了详细的观察与记录

After preliminary data exploration I found the following main challenges of the ODIR dataset:

经过初步的数据探索,我发现了ODIR数据集的以下主要挑战:

高度不均衡的数据集中绝大多数图像被归类为正常状态(共1140例),而某些特定疾病(如高血压)仅包含100个病例。

·高度失衡的数据。 大多数图像被分类为正常图像(1140个实例),而特定疾病类别(如高血压)在数据集中仅有100例。

The dataset includes multi-classified diseases as each eye may be affected by either a single condition or multiple conditions.

数据集涵盖多标签疾病,并非仅限于每一只眼睛只能患有单一疾病;而是每一只眼睛还可以同时患有多种疾病。

The collection of images classified as "other diseases/abnormalities" (O) includes images linked to over 10 different diseases, which significantly increase the variation observed.

标记为"其他疾病/异常"(O)的图像涵盖了与10多种不同疾病相关的图像资料,并显著增加了这种标记下的图像变异性。

Significantly large and diverse image dimensions. A significant number of images typically have dimensions near the size of 2976×2976 or 2592×1728 pixels.

·非常大且不同的图像分辨率。 大多数图像的大小约为2976x2976或2592x1728像素。

All these issues take a significant toll on accuracy and other metrics.

所有这些问题都会对准确性和其他指标造成重大损失。

数据预处理 (Data Pre-Processing)

随后所有图片都被重新尺寸处理了,在开始的时候我希望能够在训练模型的过程中直接完成图片的调整工作这样可以节省很多时间然而后来我发现这种方法并不明智最终导致每个epoch的执行时间长达15分钟为此我决定提前编写好一张预处理函数将所有图片预先调整好尺寸并存储在一个不同的目录中这样就能更快地进行后续的数据处理工作最初所有的图片都是调整到了32x32像素大小但很快我发现这种过小的尺寸设置虽然大大缩短了数据加载的时间但是却丢失了很多重要的图像信息导致准确率非常低经过多次实验后我发现将图片大小统一到250x250像素时既能保证较快的训练速度又能较好地保留图像信息从而达到了最佳的效果

首先,在训练模型前对所有图像进行尺寸设置。最初尝试通过TensorFlow的数据集API实现即时缩放功能。然而频繁地对图片进行即时缩放可能导致性能瓶颈问题。因此我决定采取分步优化的方式来进行操作。随后开发了一个辅助函数用于更精确地控制图片尺寸,并生成相应的数据集以支持后续处理步骤。结果经过优化后的新目录中存储的所有图片均满足所需条件。经过多次实验对比发现,在250x250像素尺寸下能够取得较好的平衡——既能显著提升运行速度又能保留足够的细节信息以保证较高的准确性水平

Secondly, images undergo labeling. An issue arises in the data.csv file's image annotations because each label encompasses both eyes (left and right) simultaneously. For example, if only one eye exhibits a condition like cataract while the other shows normal fundus health, labeling them together would incorrectly classify only one eye. However, diagnostic keywords pertain exclusively to one specific eye. To address this challenge, I enhanced the dataset by establishing mappings between diagnostic keywords and disease labels. This ensures each eye is assigned its proper label. An illustrative example of such a mapping is shown in Figure 1 as a dictionary. The label information was incorporated through renaming image files; specifically, additional letters were appended to image filenames corresponding to specific diseases. This approach avoids storing extra metadata for all labels since renaming files is both efficient and aligns with practices outlined in TensorFlow documentation [3]. Furthermore, I excluded images annotated with irrelevant information such as "lens dust" or "optic disk photographically invisible," as these do not significantly impact patient diagnosis.

其次,在data.csv文件中存在图像注释的问题,在这种情况下标签同时涉及两只眼睛(左右),而每只眼睛可能患有不同的疾病。举例来说,在左眼患有白内障而右眼具有正常视力的情况下,则标签将是白内障这一结果,并不表示对右眼的诊断结果。幸运的是该系统的关键字诊断仅与单只眼睛相关联。数据集构建的方法是为模型提供左眼和右眼图像作为输入,并返回整体(对于双眼)综合后的诊断结果而忽略了一只眼睛保持健康状态的事实。我认为从这种模型的实际应用角度来看这一设定是没有意义的最好将每一只眼睛分别进行预测以便了解例如应治疗哪只眼睛等问题。为此我通过在诊断关键字与疾病标签之间建立映射关系来丰富了数据集因此每一只眼睛都能被分配到适当的标签信息中去该映射的具体内容以字典的形式展示于图1中通过重命名图像名称来添加标签信息具体而言就是在图像文件名中添加一个或多个对应特定疾病的字母代码以标识其对应的疾病信息。这样做是因为不需要存储带有所有标签的额外数据框以及重命名图片是一项非常快捷的操作在TensorFlow官方文档中仅基于文件路径构建TensorFlow数据集并从文件名中检索标签信息[3]此外一些注释信息与疾病本身无关但会与图像质量低下有关如"镜头尘"或"照相上看不见的光盘"这类图片会被排除在外因为它们在后续的图像处理过程中不具备决定性作用最终确定患者的疾病情况

Image for post

Figure 1: A portion of the dictionary establishing correspondence between distinct diagnostic key terms and disease annotations.

Thirdly, a validation set is created by randomly choosing 30% of all available images.
I decided on this proportion because the dataset is relatively small (comprising only 7,000
images in total), but I aimed to ensure that my validation set was representative enough
and avoid any potential bias during model evaluation,
particularly considering that many image variants or classes might not be adequately represented
in the validation set.
The ODIR dataset includes testing images; however,
unfortunately,
no labeling information was provided for them within
the data.csv file,
making it impossible for me
to utilize these testing images for model evaluation.

为了构建验证集, 我从可用的所有图像中随机选择了约30%. 由于该数据集规模较小(总共有大约7,000张图片), 因此我选择了其中约30%来构建验证集. 然而, 我希望我的验证样本能够充分反映整体情况. 在评估模型时避免引入偏差, 这一现象源于许多图像变体或类别无法在验证集中体现其特征. 此外, 在ODIR数据集中提供了测试图片, 但遗憾的是, 在data.csv文件中并未提供这些测试图片的标签信息, 因此我不具备足够的信息来进行模型评估.

随后,在训练集上对少数类别的数据进行了增强处理以平衡数据集。随后应用了随机缩放、随机旋转、左右翻转以及上下翻转等变换操作。“实时增强”策略是在训练过程中动态应用数据增强以提高模型泛化能力[4]。” 为了实现这一点,在构建基于TensorFlow的数据集之前采用了OpenCV等图像处理库进行预处理以避免实时增强带来的计算开销。” 在这一阶段我还考虑过对所有图像应用对比度受限的直方图均衡化(CLAY)以提升图像细节可见度但该方法在增加背景噪声方面表现不佳因而放弃了这一方案。” 图2展示了部分基于自定义函数(结合PIL与OpenCV库)的数据增强效果

随后,在训练集中应用数据增强技术以平衡类别分布。

Image for post

Fig. 2: Exemplary data augmentation results 图2:示例性数据扩充结果

Finally, the TensorFlow dataset object is created. It is developed very similarly to the one presented in official TensorFlow documentation for loading images[5]. Since the library is complicated, and not easy to use for TensorFlow beginners, I would like to share here a summary of my findings on building scalable and fast input pipelines. The tf.data API enables you to build complex input pipelines from simple, reusable pieces. For example, the pipeline for an image model might aggregate data from files in a distributed file system. The tf.data API introduces a tf.data.Dataset abstraction that represents a sequence of elements, in which each element consists of one or more components. For example, in my image pipeline, an element is a single training example, with a pair of tensor components representing the image and its label[6]. With the idea of creating mini-batches, TensorFlow introduces the so-called iterative learning process which is feeding to the model some portion of data (not entire dataset), training, and repeating with another portion, which are called batches. Batch size defines how many examples will be extracted at each training step. After each step, weights are updated. I selected batch size equal to 32, in order to avoid the overfitting problem. With small batch size, weights keep updating regularly and often. The downside of having a small batch size is that training takes much longer than with the bigger size. One important element of tf.data is the ability of the shuffling dataset. In shuffling, the dataset fills a buffer with elements, then randomly samples elements from this buffer, replacing the selected elements with new elements[7]. It prevents situations when images of the same class will be repetitively filled to the batch, which is not beneficial for training the model.

最后,在创建TensorFlow数据集对象方面也进行了深入研究。 其开发过程与官方文档中关于图像加载开发的部分极为相似[5]。 由于其复杂性较高,并对初学者而言使用难度较大[6],我想在此分享我在构建可扩展性和高效输入管道方面的经验心得。 通过一系列简单且可重用的基本组件来构建复杂的输入管道会是一个有效的方法[7]。 您可以通过这种方式轻松实现这一目标,并且在实际应用中取得了不错的效果。 在模型训练过程中,默认情况下采用32作为批量大小的选择能够很好地平衡训练效率与模型性能之间的关系[8]。 这一选择基于以下考虑:当批量较小时(如16),虽然每个训练步骤所需时间会有所增加[9],但模型也能更快地适应新的数据分布情况;而当批量过大时,则可能导致模型过于依赖特定的数据分布而影响泛化能力[10]。 因此,在实际应用中需要根据具体场景合理选择合适的批量大小以达到最佳效果

建立卷积神经网络 (Building Convolutional Neural Network)

Among deep learning models, convolutional neural networks (CNNs) represent an important class of deep neural networks that are commonly employed for analyzing visual imagery[8]. The input layer receives images with dimensions of 250x250 RGB pixels. The first two-dimensional convolutional layer processes the input image by sliding over it with windows sized at 5x5 pixels to extract and store features within a multi-dimensional array. In this instance, the number of filters in the first layer amounts to 32, resulting in an output tensor with dimensions (250, 250, 32).

在深度学习领域中,convolutional neural network (CNN)是一种重要的深度神经网络模型,在视觉图像分析方面具有广泛的应用[8]。输入层能够接收分辨率高达250×250像素的RGB图像数据。第一个二维卷积层采用5×5像素大小的滤波器,在输入图像上滑动以提取特征并存储在一个三维数组中。具体来说,在这种配置下,滤波器的数量设置为32个。这使得输出结果形成一个大小为(250, 250, 32)的立方体结构。

This activation mechanism is utilized after every convolution layer. The activation process determines whether a neuron should be activated based on the summed weights. ReLU outputs the input value directly if it's positive; otherwise, it outputs zero. These units are nearly linear, thereby preserving key properties that enable efficient optimization using gradient-based methods and ensure good generalization capabilities[9].

在卷积层之后通常会接入整流线性单元(ReLU)。 激活单元负责判断神经元是否被激活以及是否计算加权和。 该函数输出原始输入值当输入大于等于零时;否则输出零。 考虑到整流线性单元本质上接近于线性模型,在此基础上这些模型仍然具备易于通过梯度优化方法进行训练的特征。

A conventional approach to reduce spatial dimensions involves incorporating a max-pooling layer into a neural network architecture. This operation not only gradually reduces the spatial size of input representations but also effectively minimizes both computational complexity and parameter count. By applying this technique, for every (m \times n) region detected by a filter (such as 5\times 5), we extract the maximum value within that region and employ it to generate a new output matrix where each element corresponds to this maximum value from its respective input area.

为了逐步缩小输入表示的空间尺寸以最大限度减少参数数量的同时,并加入网络最大池化层中的计算操作。简而言之,在我的示例中使用尺寸为(5,5)的滤波器所代表的每个区域时,则会提取该区域的最大值并生成一个新的输出矩阵

To prevent overfitting issues by incorporating two 45% dropout layers in the model architecture. Additional batch normalization layers were integrated into the model architecture to enhance training efficiency and system robustness[10]. As a method to enhance artificial neural network training efficiency and system robustness[10], batch normalization functions by adjusting the distribution of neuron outputs to ensure compatibility with activation functions.

为了防止过拟合问题的发生,并在训练过程中对模型进行优化,在实验阶段分别引入了两个45%的比例的过滤层。 在模型构建的过程中新增了多个批处理归一化层结构设计。 批处理规范化技术作为一种旨在显著提升人工神经网络运行速度、性能以及稳定性的关键方法[10]。 其通过改变神经元输出的分布模式,在激发激活时表现更为卓越

Finally, the “cube” is flattened. No fully connected layers are implemented to keep the simplicity of the network and keep training fast. The last layer is 8 dense because 8 is the number of labels (diseases) present in the dataset. Since we are facing multi-label classification (data sample can belong to multiple instances) sigmoid activation function is applied to the last layer. The sigmoid function converts each score to the final node between 0 to 1, independent of what other scores are (in contrast to other functions like, for example, softmax), that is why sigmoid works best for the multi-label classification problems. Since we are using the sigmoid activation function, we must go with the binary cross-entropy loss. The selected optimizer is Adam with a low learning rate of 0.0001 because of the overfitting problems that I was facing during the training. The entire architecture of my CNN is presented in Fig.3.

最后阶段,"立方体"被展平以完成结构构建. 网络设计未采用完全连接层来维持其简单性和加速训练过程. 最后一层共包含8个神经元,这是因为最后一层共有8个神经元(对应数据集中的8个疾病类别). 在面对多标签分类任务(每个样本可同时归属多个类别)时,我们采用了S型激活函数于最后一层. sigmoid将各评分压缩至0至1之间作为输出值,这一特性使得sigmoid更适合解决多标签分类问题. 采用S型激活函数的同时,我们考虑使用二进制交叉熵损失作为目标函数. 在优化过程中,Adam优化器被选用,其学习率为0.0001,这主要因训练中出现过拟合现象.
我的CNN架构图示如图3所示.

Image for post

Fig. 3: Model summary 图3:模型摘要

实验与结果 (Experiments and Results)

For ease of understanding, I aimed to initiate my research with straightforward proof-of-concept experiments using simpler datasets. This approach allowed me to test whether all initial assumptions held true. Consequently, I developed a basic model designed to distinguish between normal eyes and those affected by cataracts, focusing solely on images labeled as N (normal) or C (cataract). The model demonstrated impressive performance, achieving an accuracy of 93% after just 12 epochs. This outcome effectively demonstrated that convolutional neural networks can reliably detect cataracts. As I advanced each experiment, I incorporated additional classes into the dataset. The fourth experiment utilized the comprehensive ODIR dataset, yielding nearly 50% validation accuracy. The experimental results are summarized in Table 1. From these findings, it is evident that the overall model's performance is somewhat limited due to the inherent difficulty in accurately detecting diabetes through visual analysis; an eye with diabetes often appears very similar to one with normal fundus. Detecting conditions like myopia or cataract is more straightforward since their visual characteristics vary significantly from one another and from healthy eyes. Figure 4 illustrates examples of the selected diseases studied in this project.

为了简便起见,我打算从简单的概念验证实验入手进行研究,在降低复杂度并缩小数据集的方式下测试所有先前提出的假设是否成立。随后我会逐步引入新的图像类别以提升模型的泛化能力。为此我首先设计了一个简化的模型以识别眼睛是否属于正常类别(N)或白内障类别(C)。经过训练该模型在经过简化的网络架构下完成,在12个不同时间段内的测试中验证准确率达到93%这表明使用CNN技术能够实现对白内障眼底状态的有效判别!在后续各个实验中我会逐一引入新的图像类别以检验模型的表现极限。结果显示在全部ODIR数据集中评估时精度接近50%这一结果如表1所示可以从表1的数据可以看出该模型的表现略显不足由于糖尿病眼与正常眼底之间的相似性导致了这一现象的原因尚待深入探讨然而对于近视与白内障这类疾病由于它们之间的视觉特征差异较大因此其检测难度相对较低图4展示了所选取病灶类型的标准示意图

Image for post

实验结果展示于表1中。 Legend: normal(正常),cataract(白内障),myopia(近视),AMD(黄斑变性),diabetes(糖尿病),ALL—在完整ODIR数据集上训练的模型

Image for post

This figure illustrates various types of eye diseases. Obviously, diabetes stands out as one of the most difficult to detect, while cataract is relatively simple due to its significant deviation from a normal fundus. 图4展示了不同类型的眼病。显然,在检测难度方面糖尿病是最具挑战性的疾病,在偏离正常眼底程度方面白内障则是最简单的疾病。

Across all experimental trials, an identical neural network architecture was employed. While each experiment required a varying number of training epochs to reach the observed outcomes (with some necessitating early termination and others requiring additional training iterations). Additionally, when experiments omitted the full dataset or were classified as multi-class rather than multi-label problems, a softmax activation function combined with a categorical cross-entropy loss was employed.

在所有实验中采用了相同的神经网络架构。 不同之处在于各实验达到预期成果所需的时间周期(其中一些需提前终止训练并适当减少迭代次数),而在不涉及完整数据集的实验中采用的是softmax激活函数配合分类交叉熵损失(因为这类问题属于多类别而非多标签分类任务)。

该模型可伸缩性的最终考量包括其扩展能力、性能优化以及对资源的需求评估等关键要素(Final aspects of model scalability encompass its expandable capabilities, performance optimization, and resource consumption evaluation as crucial elements)

Nowadays, in the realm of Big Data, it becomes imperative to assess each IT project based on its scalability and reproducibility. From the outset of this project's implementation, I placed significant importance on the concept that although it is a research endeavor, with more data points related to eye diseases in the future, the model could potentially be re-trained. This would likely yield enhanced results given access to a larger dataset. Therefore, our primary objective was to establish a universal data pipeline capable of handling an increased number of datapoints. This goal was largely achieved through the utilization of advanced TensorFlow functionalities, particularly with its dataset object that supports ETL (Extract, Transform, Load) processes for large-scale datasets. However, certain transformations were necessary prior to generating the TensorFlow dataset object: image resizing and augmentation of minority classes. It is plausible that in the future, image resizing could be performed on-the-fly more efficiently, alongside additional data augmentation functions such as random rotation. However, considering an expanded dataset in the future might render such augmentations unnecessary since sufficient image variations would already be available through other means. From the perspective of other widely-used deep learning datasets like ODIR, this dataset is relatively small in size. Consequently, data augmentation and over-sampling were employed to ensure meaningful results could still be achieved despite limited initial datapoints.

如今,在大数据时代背景下,评估IT项目的有效性变得至关重要,尤其是在考虑其可伸缩性和可复现实现性方面。自该项目启动以来,我就一直秉持这一理念进行推广。即便该项目处于研究阶段,在未来面临更多眼病数据时,或许能够通过重新训练模型来提升其性能,并能处理更大的图像数据库以获得更好的效果。因此,在实现这一目标的过程中,默认采用了一套先进的人工智能架构框架——特别是基于支持大规模数据集处理的ETL流程(提取、转换与加载)。不幸的是,在构建这些ETL组件之前需要进行一些必要的预处理工作:首先是图像尺寸调整以及对少数类样本的数据增强操作。或许在未来能更高效地完成这些步骤——比如直接调整图像大小——并进一步扩展增强功能(如随机旋转等)。然而,在面对更大规模的数据集时,则无需额外扩充样本数量就能获得理想的效果了

摘要 (Summary)

In this project, I have demonstrated through convolutional neural networks that it is feasible to detect various eye diseases. The most remarkable achievement was detecting cataracts with an impressive 93% accuracy. Examining all diseases simultaneously led to significantly lower results. Despite the ODIR dataset providing all-important variations for training specific diseases was not consistently available during model development, which impacted final performance metrics. However, I believe having access to a larger dataset would enhance prediction accuracy and streamline automated disease detection processes.

在这个研究项目中, 我展示了可以运用卷积神经网络来识别多种眼部疾病。

This paper presents an innovative approach for ocular disease recognition via convolutional neural networks.

Convolutional neural networks (CNNs) have established themselves as a cornerstone technology within medical imaging contexts, particularly due to their effectiveness in analyzing visual data.

Deep learning methods, such as CNNs, have demonstrated exceptional performance in various medical imaging tasks through our experiments.

This study offers critical insights into the application of deep learning for diagnosing ocular diseases ahead of promising opportunities for future advancements in medical image analysis.

卷积网络和卷积神经网络

全部评论 (0)

还没有任何评论哟~