Advertisement

CNN 图像分类实战——LeNet

阅读量:

作者:禅与计算机程序设计艺术

1.简介

深度学习模型(Deep Learning Model),又被称为卷积神经网络(Convolutional Neural Network, 简称CNN)是一种在机器学习领域中占据重要地位的关键技术。该模型被广泛应用于图像识别、视频分析以及语音处理等多种领域,并展现出卓越的效果。CNN通过执行卷积操作对输入数据进行深入分析以提取局部特征,并结合池化操作进一步减少数据维度并增强特征的抽象层次。经过一系列计算步骤后得出分类结果。与传统方法相比,CNN具备显著的优势:计算资源消耗低、能够有效处理非结构化数据以及显著提升了分类的准确性和可靠性。此外,在保持较高性能的同时 CNN 还使得其内部决策过程更加透明易懂。

  • 具有强大的特征学习能力:CNN能够识别图像中的整体布局和细节特征,并将其转化为有效的数据表示;这些转换后的数据有助于后续层更加高效地进行分类。
  • 模块化架构设计:CNN各层之间的连接关系非常灵活;这使得模型能够根据不同任务需求进行微调;同时各层共用相同的基础卷积核层。
  • 参数共享机制:由于CNN模型中权重参数被共用于多个层;这不仅减少了参数数量;还显著降低了计算复杂度。

然而由于CNN的高度非线性和深度这一特性使得其在对图像数据建模时仍存在一定的挑战性。为了克服这一挑战就需要

2.核心概念和术语

2.1 LeNet模型结构

LeNet模型由五层组成:

C1: 这一层由一个包含6个独立通道的卷积神经网络的第一层构成,在每个通道中使用的是5x5像素的空间尺寸,并且采用步长为1的方式进行滑动操作;在输出前施加sigmoid激活函数以引入非线性响应特性

  1. S2: 第二层是池化层,采用最大池化,池化窗口的大小为 2x2,步长为 2。

第三卷积层由16个5×5尺寸的卷积核构成,在卷积操作中设置卷积核之间的间距为1,并采用sigmoid函数作为激活函数。

  1. S4: 第四层也是池化层,采用最大池化,池化窗口的大小为 2x2,步长为 2。

第五个全连接层包含120、84以及10个节点分别对应于5x5区域内的特征数量、该区域的高度以及分类的总数其激活函数为Softmax函数

整个模型的结构如图所示:

2.2 LeNet模型训练过程

数据集准备

首先,下载MNIST手写数字数据集。MNIST数据集由60,000张训练图片和10,000张测试图片组成。其中,每张图片都是黑白像素值大小为28x28的灰度图。为了方便理解和实践,这里只用训练集中的前一万张图片作为实验样本,并将它们分成两类:“0”表示数字“0”,“1”表示数字“1”。为了适配LeNet网络结构,需要把每张图片转化为28x28的单通道灰度图,且像素值范围为[0, 1]。这里我已提供转换好的训练集“mnist_train_0_1.npy”和测试集“mnist_test_0_1.npy”。另外,我们还需要对测试集进行预测时,也需要对每个样本做同样的变换。这样,训练和测试都可以使用相同的代码。

复制代码
    import numpy as np
    from sklearn.utils import shuffle
    
    def load_data():
    # load data from file
    mnist_train = np.load("mnist_train_0_1.npy", allow_pickle=True).item()
    mnist_test = np.load("mnist_test_0_1.npy", allow_pickle=True).item()
    
    X_train = mnist_train["X"] / 255.0 # normalize pixel values to [0, 1]
    y_train = mnist_train["y"]
    
    X_test = mnist_test["X"] / 255.0 
    y_test = mnist_test["y"]
    
    # reduce dataset size for faster experimentation
    X_train = X_train[:10000, :, :]
    y_train = y_train[:10000]
    return (X_train, y_train), (X_test, y_test)
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

在完成数据加载后

参数初始化

在本节中, 我们将明确LeNet网络的主要参数设置, 包括网络超参数设置以及各层中的变量参数. 具体而言, 这些主要参数涉及学习速率设定、迭代轮数设定以及批次大小选择等内容. 其中, 学习速率设定将采用指数衰减策略; 各层中的偏置项b_i将被初始化为小范围随机数; 各卷积层中的卷积核权重矩阵W_{cnn}将采用He初始化方法; 全连接层中的权重矩阵W_{fully}则采用Xavier均匀分布初始化; 最后, 每一层的神经元激活值z_l也将被赋予适当初始值以加速训练过程.

复制代码
    class lenet:
    def __init__(self, learning_rate=0.1, num_epochs=10, batch_size=100, reg=0.0):
        self.learning_rate = learning_rate
        self.num_epochs = num_epochs
        self.batch_size = batch_size
        self.reg = reg
    
        # weights initialization
        self.params = {}
        self.params['C1'] = {'weights':np.random.randn(6, 1, 5, 5)*np.sqrt(2/(5*5+6)),
                             'bias':np.zeros((6))}
        self.params['S2'] = {'pooling':None}
        self.params['C3'] = {'weights':np.random.randn(16, 6, 5, 5)*np.sqrt(2/(5*5+16)),
                             'bias':np.zeros((16))}
        self.params['S4'] = {'pooling':None}
        self.params['FC5'] = {'weights':np.random.randn(120, 400)*np.sqrt(2/(400+120)),
                              'bias':np.zeros((120)),
                              'output':{'weights':np.random.randn(10, 120)*np.sqrt(2/(120+10)),
                                        'bias':np.zeros((10))}}
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

在本实现中,默认的学习率被设定为0.1;迭代轮数被指定为10次;批次大小被设定为100;并且正则化系数被初始化为一个微小的数值0.0。随后,在字典self.params中使用numpy库中的random模块生成均匀分布的初始权重值,并将这些权重赋值给神经网络模型中的各个卷积层和全连接层参数。其中需要注意的是,在卷积操作中定义了三个卷积层(C1、C3和FC5),其卷积核的空间扩展程度分别为[滤波器数量, 输入通道数量, 卷积核高度, 卷积核宽度]以及每个滤波器对应的偏置项数量为[滤波器数量]的数量级;而对于第二个和第四个池化层(S2和S4),由于其实际作用为空操作(即无功能),因此它们并未被显式地定义(仅用空字典占位)。最后,在全连接层部分定义了输出层的权重矩阵以及相应的偏置向量;其中权重矩阵的空间维度由[输出类别数量, 隐藏单元数量]决定;而偏置向量的数量则与输出类别数量一致。

模型构建

具体来说, 我们需要开发LeNet模型的正向传播模块以及逆向传播机制. 在这一过程中, 请确保依次完成卷积层(包括C1和C3)以及全连接层(FC5)的模块化设计.

复制代码
    class lenet:
       ...
    
    def forward(self, X, mode='training'):
        A1 = conv_forward(X, self.params['C1']['weights'], self.params['C1']['bias'])
        A1 = relu_forward(A1)
        A1 = max_pool_forward(A1, pool_height=2, pool_width=2, stride=2)
    
        A2 = conv_forward(A1, self.params['C3']['weights'], self.params['C3']['bias'])
        A2 = relu_forward(A2)
        A2 = max_pool_forward(A2, pool_height=2, pool_width=2, stride=2)
    
        A3 = flatten_forward(A2)
        Z = fc_forward(A3, self.params['FC5']['weights'], self.params['FC5']['bias'])
        if mode == 'training':
            self.cache = {"Z": Z}
        else:
            self.softmax_input = Z
    
    def predict(self, X):
        self.forward(X, mode='prediction')
        predicted_class = np.argmax(self.softmax_input, axis=0)
        return predicted_class
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

完成前向传播函数后,则我们可以从而能够计算损失函数,并同时进行反向传播以更新参数。首先构建C1、C3以及FC5的损失函数及其导数。

复制代码
    class lenet:
       ...
    
    def compute_loss(self, Y):
        loss, dZ = cross_entropy_loss(Y, self.softmax_input, derivative=True)
        dZ += regularization_loss(self.params, reg=self.reg) * self.reg
        self.grads = {"dZ": dZ}
    
    def backward(self):
        dA5 = self.grads["dZ"]
        dA5 = fc_backward(dA5, self.cache['Z'], self.params['FC5']['weights'])
        dA3 = flatten_backward(dA5, self.cache['A3'])
        dA2 = max_pool_backward(dA3, cache={'A1': self.cache['A1']},
                                pool_height=2, pool_width=2, stride=2)
        dA1 = conv_backward(dA2, cache={'A0': None},
                            weights=self.params['C3']['weights'], padding=(0, 0))
    
        dB3 = dA2
        dW3, db3 = params_gradient(dB3, self.cache['A2'])
        self.params['C3']['weights'] -= self.learning_rate * dW3 + self.reg * self.params['C3']['weights']
        self.params['C3']['bias'] -= self.learning_rate * db3
    
        dB2 = max_pool_backward(dA1, cache={'A0': None},
                                 pool_height=2, pool_width=2, stride=2)
        dB1 = conv_backward(dB2, cache={'A0': None}, 
                            weights=self.params['C1']['weights'], padding=(0, 0))
        dB0 = np.mean(dB1, axis=0)
        dW1, db1 = params_gradient(dB1, self.cache['A1'])
        self.params['C1']['weights'] -= self.learning_rate * dW1 + self.reg * self.params['C1']['weights']
        self.params['C1']['bias'] -= self.learning_rate * db1
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

在成功构建了损失计算和梯度计算机制之后,在线性回归模型中使用这些工具作为基础模块后,在线性回归算法中使用这些模块作为基础单元后

复制代码
    class lenet:
       ...
    
    def train(self, X, Y):
        num_batches = int(np.ceil(X.shape[0]/float(self.batch_size)))
        cost = []
        for epoch in range(self.num_epochs):
            print('Epoch:', epoch+1)
    
            # shuffle the training set before each epoch
            idx = list(range(X.shape[0]))
            X, Y = shuffle(X, Y, random_state=epoch)
    
            for i in range(num_batches):
                start = i*self.batch_size
                end = min((i+1)*self.batch_size, X.shape[0])
    
                # one iteration of gradient descent on a batch
                self.forward(X[start:end], mode='training')
                self.compute_loss(Y[start:end])
                self.backward()
    
                # calculate mean squared error and accuracy for this batch
                mse = np.sum((self.softmax_input - Y[start:end])**2)/self.softmax_input.shape[0]
                acc = np.sum(np.argmax(Y[start:end], axis=1)==np.argmax(self.softmax_input, axis=1))/self.softmax_input.shape[0]
    
                # display progress every few batches
                if ((i+1)%10==0 or i==num_batches-1) and not i==0:
                    print('\tBatch:', i+1, '| MSE:', '{:.4f}'.format(mse), '| Accuracy:', '{:.2f}%'.format(acc*100))
    
            # save cost after each epoch
            cost.append(mse)
    
        return cost
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

最终,实现完整的LeNet模型如下:

复制代码
    class lenet:
    """
    Implementation of LeNet deep neural network with sigmoid activation function, max pooling and dropout layers, and cross entropy loss function.
    """
    def __init__(self, learning_rate=0.1, num_epochs=10, batch_size=100, reg=0.0):
        self.learning_rate = learning_rate
        self.num_epochs = num_epochs
        self.batch_size = batch_size
        self.reg = reg
    
        # weights initialization
        self.params = {}
        self.params['C1'] = {'weights':np.random.randn(6, 1, 5, 5)*np.sqrt(2/(5*5+6)),
                             'bias':np.zeros((6))}
        self.params['S2'] = {'pooling':None}
        self.params['C3'] = {'weights':np.random.randn(16, 6, 5, 5)*np.sqrt(2/(5*5+16)),
                             'bias':np.zeros((16))}
        self.params['S4'] = {'pooling':None}
        self.params['FC5'] = {'weights':np.random.randn(120, 400)*np.sqrt(2/(400+120)),
                              'bias':np.zeros((120)),
                              'output':{'weights':np.random.randn(10, 120)*np.sqrt(2/(120+10)),
                                        'bias':np.zeros((10))}}
    
    def forward(self, X, mode='training'):
        A1 = conv_forward(X, self.params['C1']['weights'], self.params['C1']['bias'])
        A1 = relu_forward(A1)
        A1 = max_pool_forward(A1, pool_height=2, pool_width=2, stride=2)
    
        A2 = conv_forward(A1, self.params['C3']['weights'], self.params['C3']['bias'])
        A2 = relu_forward(A2)
        A2 = max_pool_forward(A2, pool_height=2, pool_width=2, stride=2)
    
        A3 = flatten_forward(A2)
        Z = fc_forward(A3, self.params['FC5']['weights'], self.params['FC5']['bias'])
        if mode == 'training':
            self.cache = {"Z": Z}
        else:
            self.softmax_input = Z
    
    def predict(self, X):
        self.forward(X, mode='prediction')
        predicted_class = np.argmax(self.softmax_input, axis=0)
        return predicted_class
    
    def compute_loss(self, Y):
        loss, dZ = cross_entropy_loss(Y, self.softmax_input, derivative=True)
        dZ += regularization_loss(self.params, reg=self.reg) * self.reg
        self.grads = {"dZ": dZ}
    
    def backward(self):
        dA5 = self.grads["dZ"]
        dA5 = fc_backward(dA5, self.cache['Z'], self.params['FC5']['weights'])
        dA3 = flatten_backward(dA5, self.cache['A3'])
        dA2 = max_pool_backward(dA3, cache={'A1': self.cache['A1']},
                                pool_height=2, pool_width=2, stride=2)
        dA1 = conv_backward(dA2, cache={'A0': None},
                            weights=self.params['C3']['weights'], padding=(0, 0))
    
        dB3 = dA2
        dW3, db3 = params_gradient(dB3, self.cache['A2'])
        self.params['C3']['weights'] -= self.learning_rate * dW3 + self.reg * self.params['C3']['weights']
        self.params['C3']['bias'] -= self.learning_rate * db3
    
        dB2 = max_pool_backward(dA1, cache={'A0': None},
                                 pool_height=2, pool_width=2, stride=2)
        dB1 = conv_backward(dB2, cache={'A0': None}, 
                            weights=self.params['C1']['weights'], padding=(0, 0))
        dB0 = np.mean(dB1, axis=0)
        dW1, db1 = params_gradient(dB1, self.cache['A1'])
        self.params['C1']['weights'] -= self.learning_rate * dW1 + self.reg * self.params['C1']['weights']
        self.params['C1']['bias'] -= self.learning_rate * db1
    
    def train(self, X, Y):
        num_batches = int(np.ceil(X.shape[0]/float(self.batch_size)))
        cost = []
        for epoch in range(self.num_epochs):
            print('Epoch:', epoch+1)
    
            # shuffle the training set before each epoch
            idx = list(range(X.shape[0]))
            X, Y = shuffle(X, Y, random_state=epoch)
    
            for i in range(num_batches):
                start = i*self.batch_size
                end = min((i+1)*self.batch_size, X.shape[0])
    
                # one iteration of gradient descent on a batch
                self.forward(X[start:end], mode='training')
                self.compute_loss(Y[start:end])
                self.backward()
    
                # calculate mean squared error and accuracy for this batch
                mse = np.sum((self.softmax_input - Y[start:end])**2)/self.softmax_input.shape[0]
                acc = np.sum(np.argmax(Y[start:end], axis=1)==np.argmax(self.softmax_input, axis=1))/self.softmax_input.shape[0]
    
                # display progress every few batches
                if ((i+1)%10==0 or i==num_batches-1) and not i==0:
                    print('\tBatch:', i+1, '| MSE:', '{:.4f}'.format(mse), '| Accuracy:', '{:.2f}%'.format(acc*100))
    
            # save cost after each epoch
            cost.append(mse)
    
        return cost
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

全部评论 (0)

还没有任何评论哟~