Advertisement

【AI】斯坦福CS231n课程练习(1)—— KNN和SVM分类

阅读量:

文章目录

    • 一、前言

      • 1、CS231n是啥?
      • 2、本篇博客任务
      • 3、使用的数据集
    • 二、知识准备

      • 1、KNN是什么?
      • 2、SVM是什么?
        • SVM的组成:
  • 三、实验——KNN和SVM分类

    • 1、KNN图片分类(重要步骤将在目录上体现)
      • (1)在Colab环境中切换文件夹位置并导入所需的数据集
        (2)导入必要的库包、设定参数以及加载外部模块
        (3)导入数据集并进行初步的数据处理
        (4)生成图像并直观展示数据集的基本特征
        (5)将测试集与训练集分开处理
        (6)构建KNN分类模型
        (7)在k_nearest_neighbor.py文件中完善距离计算函数并评估模型性能
        (8)通过向量运算替代矩阵运算以优化距离计算效率
        (9)对比不同距离计算方法下的运行时间比较结果
        (10)应用交叉验证方法评估模型的泛化能力
        (11)绘制K近邻算法下的准确率曲线以直观展示模型性能变化趋势
        实验总结
  • 2、SVM图片分类

  • (1)对notebook进行参数设置

  • 使用CIFAR-10数据集进行加载和预处理步骤

  • 将数据集按训练集与测试集比例分割

  • 实施标准化处理以提升模型性能

  • 建立基于SVM的图像分类模型

  • 在完成梯度计算后再次验证其准确性

  • 优化梯度更新机制以提升收敛速度

  • 调整正则化强度与学习率参数组合以优化分类效果

  • 对每个分类结果生成对应的权重可视化图表

一、前言

1、CS231n是啥?

CS231n缩写形式称为Convolutional Neural Networks for Visual Recognition;该课程也被简称为卷积神经网络。

2、本篇博客任务

  • 在 Google Colab 平台进行图像分类任务研究。具体细节请参考官方要求
    • 即完成以下两个部分:
      • Q1: k近邻分类器 (k-Nearest Neighbor classifier)
      • Q2: 支持向量机模型 (Training a Support Vector Machine)

3、使用的数据集

CIFAR-10是一个广为人知的图像分类数据集(dataset),它包括了6万张分辨率32×32的小图像。(image)。每一张图片都具备从1到10种分类标签中的一种。(classification label)。这6万张图片被划分为一个包含5万图片的训练集(training set)和一个包含1万图片的测试集(test set)。(split into training and test sets)。

二、知识准备

1、KNN是什么?

KNN全称是k近邻分类器(k-Nearest Neighbor, k-NN),它是基于最近邻方法发展而来的

  • 最简单的图像是基于邻居的方法(Nearest Neighbor),它并不是一种神经网络模型,在其核心思想上主要通过计算输入图像与其存储库中每一张图像之间的L1距离来进行判断。具体而言,在计算完所有距离后,默认与测试图像距离最近的那一张数据库中的图像所对应的标签即为该测试图像的预测结果。
    • 再说一下k-邻近分类器(k-Nearest Neighbor),这种方法会找出与测试图像最相似的前k个样本,并根据这些样本对应的标签来进行投票决定最终结果。
    • 特别地,在k等于1的情况下(即当k=1时),这种分类方法就等价于最基本的图像是基于邻居的方法(Nearest Neighbor)。
在这里插入图片描述

K-Nearest Neighbor分类器的优缺点及应用场景:

  • 简单易懂且操作便捷。
    • 算法无需投入大量时间进行训练,并且其训练阶段仅涉及将输入的数据集直接存储起来。尽管如此,在测试阶段需要耗费大量时间进行计算——因为每个测试图像都需要与所有的存储样本逐一比较。
    • 分类器需要占用大量存储空间来保存所有的训练样本,并且这些样本还需要被用来对比未来的测试数据。这导致在存储空间上的低效利用,并且当数据规模达到 gigabytes(GB)级别时就显得不够高效了。
    • 当处理的数据维度较低时(如某些特定的应用场景),这种方法可能表现良好并受到欢迎。然而,在实际应用中很少采用这种方法——这是因为通常处理的数据(如图像)具有很高的维度(例如包含许多像素),而高维空间中的点之间的距离往往不符合我们的直觉。

2、SVM是什么?

SVM 全称是支持向量机(Support Vector Machine),它是一种二分类模型,并可被视为一种线性模型分类器。为了讲解 SVM 的基本原理,请从最基础的函数开始——一个线性映射:对于任意输入数据 x,在计算 w^Tx + b > 0 表示属于正类;而 w^Tx + b < 0 则表示属于负类;当输入数据满足等式条件时,则归于决策边界。

在这里插入图片描述
  • 这个公式就是平时最常见到的线性函数,常为一维线性函数(即 W 为一维的)。
  • 当这种函数扩展到多维度的情况下时就是我们SVM要面临的情况。
  • 首先我们要做的处理是将每个图像数据都拉长为一个长度为D的列向量,大小为 [D * 1] 。其中大小为 [K * D] 的矩阵W和大小为 [K 1] 列向量 b 为该函数的参数。
  • 以CIFAR-10为例,CIFAR-10中一个图像的大小等于 [32323] ,含了该图像的所有像素信息,这些信息被拉成为一个 [3072 * 1] 的列向量, W 大小为 [103072] , b 的大小为 [10*1] 。
  • 因此,3072个数字(素数值)输入函数,函数输出10个数字(不同分类得到的评分)。参数 W 被称为权重(weights)。 b 被称为偏差向量(bias vector)。

举个例子,在高维度情况下,以CIFAR-10为例:

在CIFAR-10数据集中对图片进行处理后会生成一个3072维的向量表示,在这种情况下我们可以将其视为在高维空间中进行分类任务。具体而言每个图像通过颜色通道转换后会映射到一个3072维的空间点上 而线性分类器则是在这个高维空间中定义了一个超平面用于划分不同的类别

在这里插入图片描述
  • 图像空间示意图显示了多个图像与三个分类器的关系。通过以红色汽车分类器为例进行分析可知:红线上方(右侧)的所有点均具有正向得分,并且这些得分呈线性递增趋势;而红线下方(左侧)则对应负向得分,并呈线性递减分布。
  • 我们的任务是确定参数W和偏置b以使该超平面能够有效地区分不同类别。具体操作包括不断调整W和b的值(即持续进行旋转和平移),直至达到最小化分类误差的目标。
SVM的组成:

图像数据预处理部分:
在上述方法中,默认情况下使用的原始像素值(范围在0到255之间)。为了提高机器学习模型的性能,在输入特征进行归一化处理是必要的步骤之一。在图像处理领域中,每个像素点都是一个独立的特征,在实际应用中通常会对其进行标准化处理。具体来说,在训练过程中我们会计算所有训练样本图像的平均值作为基准图,并将每个样本图像减去该基准图以去除整体亮度差异。经过这一操作后,各个像素点的数值大致分布在[-127, 127]区间内。接下来就是将这些标准化后的数值进一步压缩范围至[-1, 1]区间内完成统一化处理。

损失函数(loss function):
如何评估分类器的偏差程度即为当前问题的核心?为此需采用损失函数作为解决方案的关键手段:

在这里插入图片描述

这个函数得到的就是当前分类的偏差值。

正则化(Regularization):
该损失函数存在一个问题。对于所有i来说,在数据集X和权重集合W中都能正确分类每一个样本(即所有的边界都满足)。

三、实验——KNN和SVM分类

本次实验的环境依然是Google Colab,我们需要把作业包导入云端环境:

  • 云端环境与本机文件目录之间存在差异。
    因此,在代码中读取Colab中的文件(特别是图片)成为一个挑战。
  • 需要按照以下指令输入以获得访问Google Drive账号权限。
在这里插入图片描述
  • 之后会提示让你输入一个权限码:
在这里插入图片描述

并将文件存放至云盘:

在这里插入图片描述

1、KNN图片分类(重要步骤将在目录上体现)

接下来的操作代码都在CoLab上运行:

(1)在colab上切换目录,加载dataset
复制代码
    from google.colab import drive
    
    drive.mount('/content/drive', force_remount=True)
    
    #这一步就是切换到作业目录,没啥好说的
    # enter the foldername in your Drive where you have saved the unzipped
    # 'cs231n' folder containing the '.py', 'classifiers' and 'datasets' folders.
    # e.g. 'cs231n/assignments/assignment1/cs231n/'
    
    FOLDERNAME = 'assignment1/cs231n'#这里是我自己的云盘目录
    assert FOLDERNAME is not None, "[!] Enter the foldername."
    
    %cd drive/My\ Drive
    %cp -r $FOLDERNAME ../../
    %cd ../../
    %cd cs231n/datasets/
    !bash get_datasets.sh
    %cd ../../

运行成功后如下图:

在这里插入图片描述
(2)加载包、设置和外部模块
复制代码
    
    #倒入一些包和设置
    
    # Run some setup code for this notebook.
    import random
    import numpy as np
    from cs231n.data_utils import load_CIFAR10
    import matplotlib.pyplot as plt
    
    # This is a bit of magic to make matplotlib figures appear inline in the notebook rather than in a new window.
    %matplotlib inline
    plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
    plt.rcParams['image.interpolation'] = 'nearest'
    plt.rcParams['image.cmap'] = 'gray'
    
    # Some more magic so that the notebook will reload external python modules;
    #加载外部模块
    # see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
    %load_ext autoreload
    %autoreload 2
(3)加载、初步处理数据
复制代码
    # Load the raw CIFAR-10 data.
    #加载原始数据
    cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
    
    # Cleaning up variables to prevent loading data multiple times (which may cause memory issue)
    #清理变量以防止多次加载数据
    try:
       del X_train, y_train
       del X_test, y_test
       print('Clear previously loaded data.')
    except:
       pass
    
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
    
    # As a sanity check, we print out the size of the training and test data.
    #作为完备性检查,我们打印出训练和测试数据的大小
    print('Training data shape: ', X_train.shape)
    print('Training labels shape: ', y_train.shape)
    print('Test data shape: ', X_test.shape)
    print('Test labels shape: ', y_test.shape)

运行成功后:

在这里插入图片描述
(4)可视化打印一些图片看看我们的数据集长什么样
复制代码
    # Visualize some examples from the dataset.
    #可视化数据集中的一些示例
    # We show a few examples of training images from each class.
    #展示每个类的一些训练图像的例子
    
    classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
    num_classes = len(classes)
    samples_per_class = 7
    for y, cls in enumerate(classes):
    idxs = np.flatnonzero(y_train == y)
    idxs = np.random.choice(idxs, samples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt_idx = i * num_classes + y + 1
        plt.subplot(samples_per_class, num_classes, plt_idx)
        plt.imshow(X_train[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls)
    plt.show()
在这里插入图片描述
(5)对测试、训练数据进行分组
复制代码
    # Subsample the data for more efficient code execution in this exercise
    #对数据进行分组
    num_training = 5000
    mask = list(range(num_training))
    X_train = X_train[mask]
    y_train = y_train[mask]
    
    num_test = 500
    mask = list(range(num_test))
    X_test = X_test[mask]
    y_test = y_test[mask]
    
    # Reshape the image data into rows
    X_train = np.reshape(X_train, (X_train.shape[0], -1))
    X_test = np.reshape(X_test, (X_test.shape[0], -1))
    
    #打印训练集、测试集的规模
    print(X_train.shape, X_test.shape)

运行成功后如下图:

在这里插入图片描述
(6)创建KNN分类器
复制代码
    from cs231n.classifiers import KNearestNeighbor
    
    #创建kNN分类器
    # Create a kNN classifier instance. 
    # Remember that training a kNN classifier is a noop: 
    # the Classifier simply remembers the data and does no further processing
     
    classifier = KNearestNeighbor()
    classifier.train(X_train, y_train)

但是注意:

在这里插入图片描述

创建完KNN分类器后,主要有两步操作:

  • 计算所有测试集与彼此之间的一种指标(即指那些随着距离越大越不相似而被考虑的指标)
    • 对于每一个待分类样本,在这些预设的距离基础上确定k个最近的邻居,并将它们用于图像的分类或标注。

其中,本次作业要求:

  • 请采用何种方法计算具体距离?该任务作为考核重点被布置,请确保在编写代码时禁止使用numpy中的np.linalg.norm()函数。
  • 在开始编写代码之前,请访问指定目录cs231n/classifiers/k_nearest_neighbor并完成该任务中定义的compute_distances_two_loops函数。此函数要求通过双重循环遍历每一个测试样本与训练样本对,并逐个计算距离矩阵中的每个元素值。
(7)在k_nearest_neighbor.py文件中补全距离代码并测试效果、准确率。
在这里插入图片描述

需要补全的部分较多

复制代码
    # Open cs231n/classifiers/k_nearest_neighbor.py and implement
    # compute_distances_two_loops.
    
    # Test your implementation:
    dists = classifier.compute_distances_two_loops(X_test)
    print(dists.shape)
    
    # We can visualize the distance matrix: each row is a single test example and
    # its distances to training examples
    plt.imshow(dists, interpolation='none')
    plt.show()
    
    # Now implement the function predict_labels and run the code below:
    # We use k = 1 (which is Nearest Neighbor).
    y_test_pred = classifier.predict_labels(dists, k=1)
    
    # Compute and print the fraction of correctly predicted examples
    num_correct = np.sum(y_test_pred == y_test)
    accuracy = float(num_correct) / num_test
    print('Got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy))
在这里插入图片描述

在课程文档中展示的标注(如附图所示),当你的准确率达到27%时即可判断我们的代码逻辑无误,并从而可执行下一步操作。

加粗样式

接下来再微调一下参数k(k就是k近邻的那个k):

复制代码
    y_test_pred = classifier.predict_labels(dists, k=5)
    num_correct = np.sum(y_test_pred == y_test)
    accuracy = float(num_correct) / num_test
    print('Got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy))

随后准确率有了一定的提升——0.4%:

在这里插入图片描述
(8)使用向量而不是矩阵运算来优化距离计算过程
复制代码
    # Now lets speed up distance matrix computation by using partial vectorization
    # with one loop. Implement the function compute_distances_one_loop and run the
    # code below:
    dists_one = classifier.compute_distances_one_loop(X_test)
    
    # To ensure that our vectorized implementation is correct, we make sure that it agrees with the naive implementation. There are many ways to decide whether two matrices are similar; one of the simplest is the Frobenius norm. In case
    # you haven't seen it before, the Frobenius norm of two matrices is the square
    # root of the squared sum of differences of all elements; in other words, reshape the matrices into vectors and compute the Euclidean distance between them.
    difference = np.linalg.norm(dists - dists_one, ord='fro')
    print('One loop difference was: %f' % (difference, ))
    if difference < 0.001:
    print('Good! The distance matrices are the same')
    else:
    print('Uh-oh! The distance matrices are different')

运行成功如下图,说明可以转为向量计算。

在这里插入图片描述
复制代码
    # Now implement the fully vectorized version inside compute_distances_no_loops and run the code
    dists_two = classifier.compute_distances_no_loops(X_test)
    
    # check that the distance matrix agrees with the one we computed before:
    difference = np.linalg.norm(dists - dists_two, ord='fro')
    print('No loop difference was: %f' % (difference, ))
    if difference < 0.001:
    print('Good! The distance matrices are the same')
    else:
    print('Uh-oh! The distance matrices are different')
\()
(9)比较两种距离计算方式的速度
复制代码
    # Let's compare how fast the implementations are
    def time_function(f, *args):
    """
    Call a function f with args and return the time (in seconds) that it took to execute.
    """
    import time
    tic = time.time()
    f(*args)
    toc = time.time()
    return toc - tic
    
    two_loop_time = time_function(classifier.compute_distances_two_loops, X_test)
    print('Two loop version took %f seconds' % two_loop_time)
    
    one_loop_time = time_function(classifier.compute_distances_one_loop, X_test)
    print('One loop version took %f seconds' % one_loop_time)
    
    no_loop_time = time_function(classifier.compute_distances_no_loops, X_test)
    print('No loop version took %f seconds' % no_loop_time)
    
    # You should see significantly faster performance with the fully vectorized implementation!
    
    # NOTE: depending on what machine you're using, 
    # you might not see a speedup when you go from two loops to one loop, 
    # and might even see a slow-down.
在这里插入图片描述
(10)交叉验证
复制代码
    num_folds = 5
    k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]
    
    X_train_folds = []
    y_train_folds = []
    ################################################################################
    # TODO:                                                                        #
    # Split up the training data into folds. After splitting, X_train_folds and    #
    # y_train_folds should each be lists of length num_folds, where                #
    # y_train_folds[i] is the label vector for the points in X_train_folds[i].     #
    # Hint: Look up the numpy array_split function.                                #
    ################################################################################
    pass
    # split self.X_train to 5 folds
    avg_size = int(X_train.shape[0] / num_folds) # will abandon the rest if not divided evenly.
    for i in range(num_folds):
    X_train_folds.append(X_train[i * avg_size : (i+1) * avg_size])
    y_train_folds.append(y_train[i * avg_size : (i+1) * avg_size])
    ################################################################################
    #                                 END OF YOUR CODE                             #
    ################################################################################
    
    # A dictionary holding the accuracies for different values of k that we find
    # when running cross-validation. After running cross-validation,
    # k_to_accuracies[k] should be a list of length num_folds giving the different
    # accuracy values that we found when using that value of k.
    k_to_accuracies = {}
    
    
    ################################################################################
    # TODO:                                                                        #
    # Perform k-fold cross validation to find the best value of k. For each        #
    # possible value of k, run the k-nearest-neighbor algorithm num_folds times,   #
    # where in each case you use all but one of the folds as training data and the #
    # last fold as a validation set. Store the accuracies for all fold and all     #
    # values of k in the k_to_accuracies dictionary.                               #
    ################################################################################
    pass
    for k in k_choices:
    accuracies = []
    print k
    for i in range(num_folds):
        X_train_cv = np.vstack(X_train_folds[0:i] + X_train_folds[i+1:])
        y_train_cv = np.hstack(y_train_folds[0:i] + y_train_folds[i+1:])
        X_valid_cv = X_train_folds[i]
        y_valid_cv = y_train_folds[i]
        
        classifier.train(X_train_cv, y_train_cv)
        dists = classifier.compute_distances_no_loops(X_valid_cv)
        accuracy = float(np.sum(classifier.predict_labels(dists, k) == y_valid_cv)) / y_valid_cv.shape[0]
        accuracies.append(accuracy)
    k_to_accuracies[k] = accuracies
    ################################################################################
    #                                 END OF YOUR CODE                             #
    ################################################################################
    
    # Print out the computed accuracies
    for k in sorted(k_to_accuracies):
    for accuracy in k_to_accuracies[k]:
        print 'k = %d, accuracy = %f' % (k, accuracy)
在这里插入图片描述
(11)可视化k近邻的精确度
复制代码
    # plot the raw observations
    for k in k_choices:
      accuracies = k_to_accuracies[k]
      plt.scatter([k] * len(accuracies), accuracies)
    
    # plot the trend line with error bars that correspond to standard deviation
    accuracies_mean = np.array([np.mean(v) for k,v in sorted(k_to_accuracies.items())])
    accuracies_std = np.array([np.std(v) for k,v in sorted(k_to_accuracies.items())])
    plt.errorbar(k_choices, accuracies_mean, yerr=accuracies_std)
    plt.title('Cross-validation on k')
    plt.xlabel('k')
    plt.ylabel('Cross-validation accuracy')
    plt.show()
在这里插入图片描述

基于以上交叉验证的结果,在完成模型训练后,我们能够系统地确定k值的最佳取值范围,并对模型性能进行评估。最终实验结果表明,在测试数据集上的准确率达到28.1%以上。

复制代码
    # Based on the cross-validation results above, choose the best value for k, retrain the classifier using all the training data, and test it on the test data. You should be able to get above 28% accuracy on the test data.
    best_k = 10
    
    classifier = KNearestNeighbor()
    classifier.train(X_train, y_train)
    y_test_pred = classifier.predict(X_test, k=best_k)
    
    # Compute and display the accuracy
    num_correct = np.sum(y_test_pred == y_test)
    accuracy = float(num_correct) / num_test
    print 'Got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy)
在这里插入图片描述

从图表中可以看出准确度达到了...以上。这可能就是...算法能达到的最大准确率。由此可见,尽管...效果并不算出色但其基本原理非常简单相对容易训练。

实验总结
  • 课程需要逐步完成两次循环层和一次循环层的操作,在最终阶段彻底利用 numpy 包进行矩阵计算。
    • 函数 compute_distances_no_loops 的三行代码源自参考别人的代码后得以理解其工作原理。

2、SVM图片分类

类似现有KNN相关配置流程的某些设置步骤已被省略,请关注后续将重点介绍的关键步骤

作业核心要求:

  • 完善 linear_svm.py 中两个关键函数(svm_loss_naive 和 svm_loss_vectorized)的功能实现;
    * 通过理论推导确定梯度计算的准确性,并采用数值验证方法(如 gradient check)进行验证。

在比较两种损失函数的性能时,在同一条件下评估不同算法的表现时,在一致的loss下使用线性核函数的SVM向量化的计算时间更低。

(1)为notebook进行配置
复制代码
    # Run some setup code for this notebook.notebook配置代码
    import random
    import numpy as np
    from cs231n.data_utils import load_CIFAR10
    import matplotlib.pyplot as plt
    
    # This is a bit of magic to make matplotlib figures appear inline in the notebook rather than in a new window.
    %matplotlib inline
    plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
    plt.rcParams['image.interpolation'] = 'nearest'
    plt.rcParams['image.cmap'] = 'gray'
    
    # Some more magic so that the notebook will reload external python modules;
    # see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
    %load_ext autoreload
    %autoreload 2
(2)CIFAR-10数据加载和预处理
复制代码
    # Load the raw CIFAR-10 data.
    cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
    
    # Cleaning up variables to prevent loading data multiple times (which may cause memory issue)
    try:
       del X_train, y_train
       del X_test, y_test
       print('Clear previously loaded data.')
    except:
       pass
    
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
    
    # As a sanity check, we print out the size of the training and test data.
    print('Training data shape: ', X_train.shape)
    print('Training labels shape: ', y_train.shape)
    print('Test data shape: ', X_test.shape)
    print('Test labels shape: ', y_test.shape)

之后老样子,可视化我们的数据集看一看效果:

复制代码
    # Visualize some examples from the dataset.
    # We show a few examples of training images from each class.
    classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
    num_classes = len(classes)
    samples_per_class = 7
    for y, cls in enumerate(classes):
    idxs = np.flatnonzero(y_train == y)
    idxs = np.random.choice(idxs, samples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt_idx = i * num_classes + y + 1
        plt.subplot(samples_per_class, num_classes, plt_idx)
        plt.imshow(X_train[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls)
    plt.show()
在这里插入图片描述
(3)将数据拆分为训练集、测试集
复制代码
    # Split the data into train, val, and test sets. In addition we willcreate a small development set as a subset of the training data; we can use this for development so our code runs faster.
    num_training = 49000
    num_validation = 1000
    num_test = 1000
    num_dev = 500
    
    # Our validation set will be num_validation points from the original training set.
    mask = range(num_training, num_training + num_validation)
    X_val = X_train[mask]
    y_val = y_train[mask]
    
    # Our training set will be the first num_train points from the original training set.
    mask = range(num_training)
    X_train = X_train[mask]
    y_train = y_train[mask]
    
    # We will also make a development set, which is a small subset of the training set.
    mask = np.random.choice(num_training, num_dev, replace=False)
    X_dev = X_train[mask]
    y_dev = y_train[mask]
    
    # We use the first num_test points of the original test set as our test set.
    mask = range(num_test)
    X_test = X_test[mask]
    y_test = y_test[mask]
    
    print 'Train data shape: ', X_train.shape
    print 'Train labels shape: ', y_train.shape
    print 'Validation data shape: ', X_val.shape
    print 'Validation labels shape: ', y_val.shape
    print 'Test data shape: ', X_test.shape
    print 'Test labels shape: ', y_test.shape
(4)数据预处理

预处理1——将图像数据重组成行,并且打印出来看看:

复制代码
    # Preprocessing: reshape the image data into rows
    X_train = np.reshape(X_train, (X_train.shape[0], -1))
    X_val = np.reshape(X_val, (X_val.shape[0], -1))
    X_test = np.reshape(X_test, (X_test.shape[0], -1))
    X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))
    
    # As a sanity check, print out the shapes of the data
    print 'Training data shape: ', X_train.shape
    print 'Validation data shape: ', X_val.shape
    print 'Test data shape: ', X_test.shape
    print 'dev data shape: ', X_dev.shape
在这里插入图片描述

预处理2——第一步,首先减去均值图像,根据训练数据计算图像均值:

复制代码
    # Preprocessing: subtract the mean image first: compute the image mean based on the training data
    mean_image = np.mean(X_train, axis=0)
    print mean_image[:10] # print a few of the elements
    plt.figure(figsize=(4,4))
    plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean image
    plt.show()
在这里插入图片描述

预处理2——第二步,从训练和测试数据中减去平均图像 :

复制代码
    # second: subtract the mean image from train and test data
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image
    X_dev -= mean_image

在预处理2的第三阶段中,在SVM模型中引入一个方向维度,并通过该维度使算法只需关注单一权重矩阵W的优化以降低计算复杂度。

复制代码
    # third: append the bias dimension of ones (i.e. bias trick) so that our SVM only has to worry about optimizing a single weight matrix W.
    X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
    X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
    X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
    X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])
    
    print X_train.shape, X_val.shape, X_test.shape, X_dev.shape
在这里插入图片描述
(5)建立SVM分类器
  • 此部分的代码块将全部放置于cs231n / classifiers / linear_svm.py文件中。
  • 在本次作业中已为此功能准备好了compute_loss_naive函数用于计算SVM损失值。

linear_svm.py文件中,补全的linear_svm代码功能为实现梯度,具体见下图:

在这里插入图片描述
在这里插入图片描述

之后,测试我们的损失:

复制代码
    # Evaluate the naive implementation of the loss we provided for you:
    from cs231n.classifiers.linear_svm import svm_loss_naive
    import time
    
    # generate a random SVM weight matrix of small numbers
    W = np.random.randn(3073, 10) * 0.0001 
    
    loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.00001)
    print 'loss: %f' % (loss, )
在这里插入图片描述
(6)自己写完梯度代码之后再进行检验
复制代码
    # Once you've implemented the gradient, recompute it with the code below and gradient check it with the function we provided for you Compute the loss and its gradient at W.
    loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.0)
    
    # Numerically compute the gradient along several randomly chosen dimensions, and compare them with your analytically computed gradient. The numbers should match  almost exactly along all dimensions.
    from cs231n.gradient_check import grad_check_sparse
    f = lambda w: svm_loss_naive(w, X_dev, y_dev, 0.0)[0]
    grad_numerical = grad_check_sparse(f, W, grad)
    
    # do the gradient check once again with regularization turned on you didn't forget the regularization gradient did you?
    loss, grad = svm_loss_naive(W, X_dev, y_dev, 1e2)
    f = lambda w: svm_loss_naive(w, X_dev, y_dev, 1e2)[0]
    grad_numerical = grad_check_sparse(f, W, grad)
在这里插入图片描述
(7)之后进行梯度的更新优化
复制代码
    # In the file linear_classifier.py, implement SGD in the function
    # LinearClassifier.train() and then run it with the code below.
    from cs231n.classifiers import LinearSVM
    svm = LinearSVM()
    tic = time.time()
    loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=5e4,
                      num_iters=1500, verbose=True)
    toc = time.time()
    print 'That took %fs' % (toc - tic)
在这里插入图片描述

可视化LOSS迭代次数:

复制代码
    # A useful debugging strategy is to plot the loss as a function of iteration number:
    plt.plot(loss_hist)
    plt.xlabel('Iteration number')
    plt.ylabel('Loss value')
    plt.show()
在这里插入图片描述
(8)参数优化(正则化程度和学习率)
复制代码
    # Use the validation set to tune hyperparameters (regularization strength and learning rate). You should experiment with different ranges for the learning rates and regularization strengths; if you are careful you should be able to get a classification accuracy of about 0.4 on the validation set.
    learning_rates = [1e-7, 5e-6]
    regularization_strengths = [5e4, 1e5]
    
    # results is dictionary mapping tuples of the form (learning_rate, regularization_strength) to tuples of the form (training_accuracy, validation_accuracy). The accuracy is simply the fraction of data points that are correctly classified.
    results = {}
    best_val = -1   # The highest validation accuracy that we have seen so far.
    best_svm = None # The LinearSVM object that achieved the highest validation rate.
    
    ################################################################################
    # TODO:                                                                        #
    # Write code that chooses the best hyperparameters by tuning on the validation #
    # set. For each combination of hyperparameters, train a linear SVM on the      #
    # training set, compute its accuracy on the training and validation sets, and  #
    # store these numbers in the results dictionary. In addition, store the best   #
    # validation accuracy in best_val and the LinearSVM object that achieves this  #
    # accuracy in best_svm.                                                        #
    #                                                                              #
    # Hint: You should use a small value for num_iters as you develop your         #
    # validation code so that the SVMs don't take much time to train; once you are #
    # confident that your validation code works, you should rerun the validation   #
    # code with a larger value for num_iters.                                      #
    ################################################################################
    pass
    for lr in learning_rates:
    for rs in regularization_strengths:
        svm = LinearSVM()
        loss_hist = svm.train(X_train, y_train, learning_rate=lr, reg=rs,
                      num_iters=1500, verbose=True)
        y_train_pred = svm.predict(X_train)
        train_acc = np.mean(y_train == y_train_pred)
        y_val_pred = svm.predict(X_val)
        val_acc = np.mean(y_val == y_val_pred)
        
        results[(lr, rs)] = (train_acc, val_acc)
        
        if val_acc > best_val:
            best_val = val_acc
            best_svm = svm
    ################################################################################
    #                              END OF YOUR CODE                                #
    ################################################################################
    
    # Print out results.
    for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy)
    
    print 'best validation accuracy achieved during cross-validation: %f' % best_val

优化效果如下图:

在这里插入图片描述
(9)为每一个class可视化权重
复制代码
    # Visualize the learned weights for each class.
    # Depending on your choice of learning rate and regularization strength, these may or may not be nice to look at.
    w = best_svm.W[:-1,:] # strip out the bias
    w = w.reshape(32, 32, 3, 10)
    w_min, w_max = np.min(w), np.max(w)
    classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
    for i in xrange(10):
      plt.subplot(2, 5, i + 1)
    
      # Rescale the weights to be between 0 and 255
      wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
      plt.imshow(wimg.astype('uint8'))
      plt.axis('off')
      plt.title(classes[i])
在这里插入图片描述

全部评论 (0)

还没有任何评论哟~