Neural Networks and Deep Learning week2 Logistic Regression with a Neural Network mindset

阅读量：

该实验作业的主要目的

该实验指导实现了对图像的分类识别。
在该过程中涉及了对参数a的设置、针对损失函数b进行评估以及采用连续梯度优化过程c。

你可能对以下参考文献感兴趣

http://www.wildml.com/2015/09/implementing-a-neural-network-from-scratch/ 这篇文章向您详细介绍了从零构建神经网络的方法，并提供了更加有效地帮助您深入理解神经网络细节的技术途径。访问GitHub上的代码仓库以获取完整的实现细节和实践指导。

这个平台旨在助于你了解我们之前对图片进行标准化背后的动机与目的

总体进程

载入文件

2加载数据集，并进行大小预处理

构建算法的基本流程包括设置模型初始参数、训练模型以优化参数、利用优化后的参数进行推断以及评估预测结果并总结发现。

现在开始我们搭建自己的神经网络

4.1 辅助函数 sigmoid

4.2 初始化模型参数 initialize_with_zeros(dim)

4.3 前向与反向传播 propagate(w, b, X, Y) 需要注意的是，在本阶段中我们仅使用一层神经网络（不含隐藏层），因此将前向传播与反向传播合并处理。通常情况下，则会分别进行处理；这将在下周或下下周的代码实现中得以体现

4.4 训练参数optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False)

4.4 预测结果 predict(w, b, X)

融合神经网络的过程 model(X_train, Y_train, X_test, Y_test) 经过 2000 次迭代的学习率设为 0.5 并且不再打印成本信息

你现在已经完成了作业。旨在更深入地掌握这一概念，请让我们首先聚焦于梯度下降的过程。如同我们在机器学习课程中所学的一样

更改我们的学习率，当太大会出现折叠，太小下降缓慢

测试我们自己的其他图片

在需要写代码的前面我将提示任务目标

Logistic Regression with a Neural Network mindset

Welcome to your first (mandatory) programming coursework! You are tasked with constructing a logistic regression classifier capable of identifying cat images. This coursework will guide you through the process of implementing this using a neural network perspective, while also fostering an intuitive understanding of deep learning concepts.

Instructions:

It is imperative that you avoid using loops (for/while) in your code, except when the requirements explicitly demand it.

You will learn to:

构建学习算法的一般架构，并包括以下内容：
- 参数初始化
- 计算成本函数及其梯度
- 采用优化算法（如梯度下降）
将上述三个步骤整合到主模型函数中，并按正确的顺序排列。

Updates

This notebook has been updated in recent months. The notebook previously went by version v5, which has since been replaced with version 6a.

If you were working on a previous version:

Locate prior work within the file system requires examining directories containing older files, each identified by their version names.
Accessing the file directory involves clicking on the "Coursera" icon located at the top-left corner of this notebook.
Submit assignments must be transferred from previous versions to ensure they are up-to-date before submission.

List of Updates

The forward computation process has been adjusted to begin indexing with 1 rather than 0.
In optimization function comments, the instruction has been updated to "output the cost value each time after completing a hundred training examples" instead of "examples".
Grammar issues within the comments section have been corrected.
The Y_prediction_test variable name has been consistently utilized across all code sections.
The plot's axis label has been revised to indicate "iteration count (hundreds)" rather than simply "iterations".
During testing, images are standardized by dividing each pixel value by 255.

1 - Packages

Initially, let us execute the code in the cell below to import all the necessary packages for this assignment.

numpy plays a vital role in scientific computing using Python.
h5py enables users to work with data stored in H5 files.
matplotlib offers comprehensive tools for generating publication-quality graphs in Python.
The combination of PIL and scipy facilitates testing models using personal images.

复制代码

 import numpy as np

    
 import matplotlib.pyplot as plt
    
 import h5py
    
 import scipy
    
 from PIL import Image
    
 from scipy import ndimage
    
 from lr_utils import load_dataset
    
  
    
 %matplotlib inline

2 - Overview of the Problem set

Problem Statement : You are given a dataset ("data.h5") containing:

复制代码

 - a training set of m_train images labeled as cat (y=1) or non-cat (y=0)

    
 - a test set of m_test images labeled as cat or non-cat
    
 - each image is of shape (num_px, num_px, 3) where 3 is for the 3 channels (RGB). Thus, each image is square (height = num_px) and (width = num_px).

One will assemble a straightforward image recognition system designed to accurately categorize images into two categories: cats and non-cats.

Let's explore the dataset to gain a deeper understanding. Access the data by executing the code provided below.

复制代码

 # Loading the data (cat/non-cat)

    
 train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

We included '_orig' at the end of image datasets (train and test) since we are going to preprocess them. After preprocessing, we will end up with train_set_x and test_set_x (the labels train_set_y and test_set_y do not require any preprocessing).

Each line in your train_set_x_orig and test_set_x_orig represents an image as an array. You are able to visualize a sample by executing the provided code. Please also feel free to adjust the index value and re-run the code to view additional images.

复制代码

 # Example of a picture

    
 index = 25
    
 plt.imshow(train_set_x_orig[index])
    
 print ("y = " + str(train_set_y[:, index]) + ", it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") +  "' picture.")

A number of software bugs in deep learning result from mismatched matrix/vector dimensions. Properly maintaining accurate matrix and vector dimensions is key to avoiding many programming errors. By consistently upholding these dimensional requirements, you can significantly reduce the likelihood of encountering such issues.

Exercise: Find the values for:

复制代码

 - m_train (number of training examples)

    
 - m_test (number of test examples)
    
 - num_px (= height = width of a training image)

Please note that train_set_x_orig constitutes a numpy-array having the shape (m_train, num_px, num_px, 3). for example, you can acquire m_train by using train_set_x_orig.shape[0]. Note that all variables and expressions are represented using their respective notations.

完成m_train、m_test、num_px的参数提取

提示关于train_set_x_orig中的数据结构

复制代码

 ### START CODE HERE ### (≈ 3 lines of code)

    
 m_train =train_set_x_orig.shape[0]
    
 m_test = test_set_x_orig.shape[0]
    
 num_px = train_set_x_orig.shape[1]
    
 ### END CODE HERE ###
    
  
    
 print ("Number of training examples: m_train = " + str(m_train))
    
 print ("Number of testing examples: m_test = " + str(m_test))
    
 print ("Height/Width of each image: num_px = " + str(num_px))
    
 print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)")
    
 print ("train_set_x shape: " + str(train_set_x_orig.shape))
    
 print ("train_set_y shape: " + str(train_set_y.shape))
    
 print ("test_set_x shape: " + str(test_set_x_orig.shape))
    
 print ("test_set_y shape: " + str(test_set_y.shape))

Expected Output for m_train, m_test and num_px :

m_train	209
m_test	50
num_px	64

For the sake of efficiency, you now require reshaping images with an original shape of (num_px, num_px, 3) into a numpy-array with dimensions (num_px ** num_px ** 3, 1). Once reshaped, the training and test datasets will be structured as numpy-arrays where each column corresponds to a flattened image. Each of these datasets will contain m_train (respectively m_test) columns.

Reshape the training and testing datasets in order to reorganize the images with dimensions (num_px × num_px × 3) into single vectors each with a shape of (num_px × num_px × 3, 1).

A method when you desire to flatten a matrix X with shape (a,b,c,d) into a matrix X_flatten of shape (b×c×d, a) is to employ:

复制代码

    X_flatten = X.reshape(X.shape[0], -1).T      # X.T is the transpose of X

完成对原始数据的重构

提示：为了将X(a, b, c, d)重塑为X(b×c×d, a)，可以通过该方法进行操作：X_flatten = X.reshape(X.shape[0], -1).T

复制代码

 # Reshape the training and test examples

    
  
    
 ### START CODE HERE ### (≈ 2 lines of code)
    
 train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0],-1).T
    
 test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0],-1).T
    
 ### END CODE HERE ###
    
  
    
 print ("train_set_x_flatten shape: " + str(train_set_x_flatten.shape))
    
 print ("train_set_y shape: " + str(train_set_y.shape))
    
 print ("test_set_x_flatten shape: " + str(test_set_x_flatten.shape))
    
 print ("test_set_y shape: " + str(test_set_y.shape))
    
 print ("sanity check after reshaping: " + str(train_set_x_flatten[0:5,0]))

Expected Output :

train_set_x_flatten shape	(12288, 209)
train_set_y shape	(1, 209)
test_set_x_flatten shape	(12288, 50)
test_set_y shape	(1, 50)
sanity check after reshaping	[17 31 56 22 33]

Representing color images requires specifying the RGB channels for each pixel. Since this results in each pixel being represented by a triplet of values with a range from 0 to 255.

A standard pre-processing technique in machine learning involves normalizing your dataset by subtracting the mean of the entire numpy array from each data instance and then dividing each instance by the standard deviation of the array. For image datasets, a straightforward approach is to normalize each row by dividing it by 255, which corresponds to the maximum value of a pixel channel.

Let's standardize our dataset.

复制代码

 train_set_x = train_set_x_flatten/255.

    
 test_set_x = test_set_x_flatten/255.

What you need to remember:

Common steps for pre-processing a new dataset are:

Determine the size and configuration of the problem's dimensions and shapes using variables like m_train, m_test, and num_px.
Reorganize each dataset so that every example becomes a vector with dimensions (num_px × num_px × 3) in size.
Standardize the data.

3 - General Architecture of the learning algorithm

Now is the time to develop a straightforward method for differentiating cat images from non-cat images.

You are tasked with constructing a logistic regression model from the perspective of a neural network approach. The accompanying figure demonstrates why logistic regression constitutes such a basic structure within neural networks.

Mathematical expression of the algorithm :

For one example x(i)x(i):

(1)

(2)

The cost is then computed by summing over all training examples:

Key steps : In this exercise, you will carry out the following steps:

复制代码

 - Initialize the parameters of the model

    
 - Learn the parameters for the model by minimizing the cost  
    
 - Use the learned parameters to make predictions (on the test set)
    
 - Analyse the results and conclude

4 - Building the parts of our algorithm

The main steps for building a Neural Network are:

确定模型结构（例如输入特征的数量）
设置模型参数的初始值
循环：
- 通过前向传播计算当前损失值
- 通过反向传播计算当前梯度
- 通过梯度下降更新参数

Typically, you build items 1 through 3 individually and incorporate them into a unified function referred to as model(). This approach allows for streamlined functionality within the system.

4.1 - Helper functions

Exercise : Using the code from 'Python Basics', write a sigmoid() function. To complete this exercise, you should compute...

to make predictions. Use np.exp().

完成sigmoid函数提示使用np.exp

完成上一个实验的你应该很了解了

复制代码

 # GRADED FUNCTION: sigmoid

    
  
    
 def sigmoid(z):
    
     """
    
     Compute the sigmoid of z
    
   7.     Arguments:
    
     z -- A scalar or numpy array of any size.
    
   10.     Return:
    
     s -- sigmoid(z)
    
     """
    
  
    
     ### START CODE HERE ### (≈ 1 line of code)
    
     s = 1/(1+np.exp(-z))
    
     ### END CODE HERE ###
    
     
    
     return s

复制代码

    print ("sigmoid([0, 2]) = " + str(sigmoid(np.array([0,2]))))

Expected Output :

sigmoid([0, 2])	[ 0.5 0.88079708]

4.2 - Initializing parameters

Exercise: It is necessary for you to implement parameter initialization within this cell. It is imperative that you initialize the variable w as a zero vector. If you are unsure about which numpy function to employ, refer to the documentation for np.zeros().

完成参数的初始化工作，并假设你仍需关注参数的具体数值。若你遗忘或不确定其中细节，请参考绿色代码部分已经明确标注的内容。在此实验过程中，请暂且忽略系统对称性特性的问题。

复制代码

 # GRADED FUNCTION: initialize_with_zeros

    
  
    
 def initialize_with_zeros(dim):
    
     """
    
     This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
    
     
    
     Argument:
    
     dim -- size of the w vector we want (or number of parameters in this case)
    
     
    
     Returns:
    
     w -- initialized vector of shape (dim, 1)
    
     b -- initialized scalar (corresponds to the bias)
    
     """
    
     
    
     ### START CODE HERE ### (≈ 1 line of code)
    
     w = np.zeros((dim,1))
    
     b = 0
    
     ### END CODE HERE ###
    
  
    
     assert(w.shape == (dim, 1))
    
     assert(isinstance(b, float) or isinstance(b, int))
    
     
    
     return w, b

复制代码

 dim = 2

    
 w, b = initialize_with_zeros(dim)
    
 print ("w = " + str(w))
    
 print ("b = " + str(b))

Expected Output :

w	[[ 0.] [ 0.]]
b	0

For image inputs, w will be of shape (num_px ×× num_px ×× 3, 1).

4.3 - Forward and Backward propagation

Once your variables have been initialized, you can perform the forward and backward propagation procedures for learning the model’s weights.

Exercise: Write a code block named propagate() to compute the cost function and its gradient.

Hints :

Forward Propagation:

You get X
You compute

You calculate the cost function:

Here are the two formulas you will be using:

rac{artial J}{artial b}=rac{1}{m}um{}-y^{}}

完成代价函数的计算J，后向计算结果dw，db

在这一过程中，在该过程中的代价函数J已经被建立完成；为了便于后续操作，请引入前向计算结果A

复制代码

 # GRADED FUNCTION: propagate

    
  
    
 def propagate(w, b, X, Y):
    
     """
    
     Implement the cost function and its gradient for the propagation explained above
    
   7.     Arguments:
    
     w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    
     b -- bias, a scalar
    
     X -- data of size (num_px * num_px * 3, number of examples)
    
     Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)
    
   13.     Return:
    
     cost -- negative log-likelihood cost for logistic regression
    
     dw -- gradient of the loss with respect to w, thus same shape as w
    
     db -- gradient of the loss with respect to b, thus same shape as b
    
     
    
     Tips:
    
     - Write your code step by step for the propagation. np.log(), np.dot()
    
     """
    
     
    
     m = X.shape[1]
    
     
    
     # FORWARD PROPAGATION (FROM X TO COST)
    
     ### START CODE HERE ### (≈ 2 lines of code)
    
     A = sigmoid(np.dot(w.T,X)+b)                                    # compute activation
    
     cost = -1/m*np.sum(Y*np.log(A)+(1-Y)*np.log(1-A))                                 # compute cost
    
     ### END CODE HERE ###
    
     
    
     # BACKWARD PROPAGATION (TO FIND GRAD)
    
     ### START CODE HERE ### (≈ 2 lines of code)
    
     dw = 1/m*np.dot(X,(A-Y).T)
    
     db = 1/m*np.sum(A-Y)
    
     ### END CODE HERE ###
    
  
    
     assert(dw.shape == w.shape)
    
     assert(db.dtype == float)
    
     cost = np.squeeze(cost)
    
     assert(cost.shape == ())
    
     
    
     grads = {"dw": dw,
    
          "db": db}
    
     
    
     return grads, cost

复制代码

 w, b, X, Y = np.array([[1.],[2.]]), 2., np.array([[1.,2.,-1.],[3.,4.,-3.2]]), np.array([[1,0,1]])

    
 grads, cost = propagate(w, b, X, Y)
    
 print ("dw = " + str(grads["dw"]))
    
 print ("db = " + str(grads["db"]))
    
 print ("cost = " + str(cost))

Expected Output :

dw	[[ 0.99845601] [ 2.39507239]]
db	0.00145557813678
cost	5.801545319394553

4.4 - Optimization

You already initialized your parameters.
It is possible for you to compute both a cost function and its gradient.
To proceed, you should update the parameters through the process of gradient descent.

Exercise: 完成优化函数的推导。目标是通过最小化成本函数JJ来学习参数ww和bb。对于参数θθ而言，更新规则为θ = θ − α dθ，请问这是什么意思？其中αα代表学习率。

完成向量化后的循环计算

注意之前已经完成的函数propagate(w, b, X, Y)

复习我们的神经网络过程时,首先进行正向传播,计算目标函数J;随后进行反向传播调整模型参数;重复上述步骤计数器次以完成训练

复制代码

 # GRADED FUNCTION: optimize

    
  
    
 def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
    
     """
    
     This function optimizes w and b by running a gradient descent algorithm
    
     
    
     Arguments:
    
     w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    
     b -- bias, a scalar
    
     X -- data of shape (num_px * num_px * 3, number of examples)
    
     Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)
    
     num_iterations -- number of iterations of the optimization loop
    
     learning_rate -- learning rate of the gradient descent update rule
    
     print_cost -- True to print the loss every 100 steps
    
     
    
     Returns:
    
     params -- dictionary containing the weights w and bias b
    
     grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
    
     costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.
    
     
    
     Tips:
    
     You basically need to write down two steps and iterate through them:
    
     1) Calculate the cost and the gradient for the current parameters. Use propagate().
    
     2) Update the parameters using gradient descent rule for w and b.
    
     """
    
     
    
     costs = []
    
     
    
     for i in range(num_iterations):
    
     
    
     
    
     # Cost and gradient calculation (≈ 1-4 lines of code)
    
     ### START CODE HERE ### 
    
     grads, cost = propagate(w,b,X,Y)
    
     ### END CODE HERE ###
    
     
    
     # Retrieve derivatives from grads
    
     dw = grads["dw"]
    
     db = grads["db"]
    
     
    
     # update rule (≈ 2 lines of code)
    
     ### START CODE HERE ###
    
     w = w-learning_rate*dw
    
     b = b-learning_rate*db
    
     ### END CODE HERE ###
    
     
    
     # Record the costs
    
     if i % 100 == 0:
    
         costs.append(cost)
    
     
    
     # Print the cost every 100 training iterations
    
     if print_cost and i % 100 == 0:
    
         print ("Cost after iteration %i: %f" %(i, cost))
    
     
    
     params = {"w": w,
    
           "b": b}
    
     
    
     grads = {"dw": dw,
    
          "db": db}
    
     
    
     return params, grads, costs

复制代码

 params, grads, costs = optimize(w, b, X, Y, num_iterations= 100, learning_rate = 0.009, print_cost = False)

    
  
    
 print ("w = " + str(params["w"]))
    
 print ("b = " + str(params["b"]))
    
 print ("dw = " + str(grads["dw"]))
    
 print ("db = " + str(grads["db"]))

Expected Output :

w	[[ 0.19033591] [ 0.12259159]]
b	1.92535983008
dw	[[ 0.67752042] [ 1.41625495]]
db	0.219194504541

Exercise: The following code block will output the learned weights (w) and biases (b). We can use these weights and biases to predict class labels for a given dataset X. Implementing the predict() function is straightforward. Predictions are computed in two distinct steps:

Calculate

Replace each entry of vector $\mathbf{a}$ with $0$ if its activation value is $\leq 0.5$ , otherwise assign it $1$ . The predicted values are then stored in the vector $\mathbf{Y\_prediction}$ . For your convenience, an if-else statement can be implemented within a for-loop (though another approach involves vectorization).

神经网络已经训练完成，现在我们需要用它对图片进行预测

复制代码

 # GRADED FUNCTION: predict

    
  
    
 def predict(w, b, X):
    
     '''
    
     Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
    
     
    
     Arguments:
    
     w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    
     b -- bias, a scalar
    
     X -- data of size (num_px * num_px * 3, number of examples)
    
     
    
     Returns:
    
     Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
    
     '''
    
     
    
     m = X.shape[1]
    
     Y_prediction = np.zeros((1,m))
    
     w = w.reshape(X.shape[0], 1)
    
     
    
     # Compute vector "A" predicting the probabilities of a cat being present in the picture
    
     ### START CODE HERE ### (≈ 1 line of code)
    
     A = sigmoid(np.dot(w.T,X)+b)
    
     ### END CODE HERE ###
    
     
    
     for i in range(A.shape[1]):
    
     
    
     # Convert probabilities A[0,i] to actual predictions p[0,i]
    
     ### START CODE HERE ### (≈ 4 lines of code)
    
     if A[0,i]<0.5:
    
         Y_prediction[0,i]=0
    
     else:
    
         Y_prediction[0,i]=1
    
     ### END CODE HERE ###
    
     
    
     assert(Y_prediction.shape == (1, m))
    
     
    
     return Y_prediction

复制代码

 w = np.array([[0.1124579],[0.23106775]])

    
 b = -0.3
    
 X = np.array([[1.,-1.1,-3.2],[1.2,2.,0.1]])
    
 print ("predictions = " + str(predict(w, b, X)))

Expected Output :

predictions	[[ 1. 1. 0.]]

What to remember: You've implemented several functions that:

Set initial values for (w, b)
Iteratively minimize the cost function to determine optimal values for both w and b:
- Calculate both the cost and its gradient with respect to w and b
- Update w and b using gradient descent method
Employ these learned parameters to predict labels for new data instances

5 - Merge all functions into a model

You now have the ability to understand how the entire model is organized by assembling all of its building blocks (functions developed in earlier sections) into a coherent structure.

Exercise: Implement the model function. Use the following notation:

复制代码

 - Y_prediction_test for your predictions on the test set

    
 - Y_prediction_train for your predictions on the train set
    
 - w, costs, grads for the outputs of optimize()

现在我们已经完成了所有模块，需要将他们整合到一起了

有初始化，训练，预测，注意训练的目的是的到参数

Initialize weights as zero vector initialize_with_zeros(dim). Optimize parameters using gradient descent optimize(w,b,X,Y,num_iterations.learning_rate(print_cost=False). Call the prediction method predict(w,b,X).

复制代码

 # GRADED FUNCTION: model

    
  
    
 def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
    
     """
    
     Builds the logistic regression model by calling the function you've implemented previously
    
     
    
     Arguments:
    
     X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
    
     Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
    
     X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
    
     Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
    
     num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
    
     learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
    
     print_cost -- Set to true to print the cost every 100 iterations
    
     
    
     Returns:
    
     d -- dictionary containing information about the model.
    
     """
    
     
    
     ### START CODE HERE ###
    
     
    
     # initialize parameters with zeros (≈ 1 line of code)
    
     w, b = initialize_with_zeros(X_train.shape[0])
    
  
    
     # Gradient descent (≈ 1 line of code)
    
     parameters, grads, costs =  optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
    
     
    
     # Retrieve parameters w and b from dictionary "parameters"
    
     w = parameters["w"]
    
     b = parameters["b"]
    
     
    
     # Predict test/train set examples (≈ 2 lines of code)
    
     Y_prediction_test =  predict(w, b, X_test)
    
     Y_prediction_train =  predict(w, b, X_train)
    
  
    
     ### END CODE HERE ###
    
  
    
     # Print train/test Errors
    
     print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    
     print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))
    
  
    
     
    
     d = {"costs": costs,
    
      "Y_prediction_test": Y_prediction_test, 
    
      "Y_prediction_train" : Y_prediction_train, 
    
      "w" : w, 
    
      "b" : b,
    
      "learning_rate" : learning_rate,
    
      "num_iterations": num_iterations}
    
     
    
     return d

Run the following cell to train your model.

复制代码

    d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)

Expected Output :

Cost after iteration 0	0.693147
⋮⋮	⋮⋮
Train Accuracy	99.04306220095694 %
Test Accuracy	70.0 %

Comment : 训练集的准确率达到99.9%左右。这是一个良好的基础检验：你的模型运行良好，并且具备足够的能力来拟合训练数据。测试集的准确率为68%，这其实并不算差劲——考虑到我们使用了规模较小的数据集以及所采用的是线性分类器（逻辑回归）的缘故。但无需担心，在接下来的一周里你将构建出一个更强大的分类器。

Additionally, you observe that the model exhibits clear signs of overfitting during training. Later in this specialization, you will be taught how to mitigate overfitting, such as through regularization techniques. By utilizing the provided code (and adjusting the index variable), examine predictions made on images from the test set.

复制代码

 # Example of a picture that was wrongly classified.

    
 index = 1
    
 plt.imshow(test_set_x[:,index].reshape((num_px, num_px, 3)))
    
 print ("y = " + str(test_set_y[0,index]) + ", you predicted that it is a \"" + classes[d["Y_prediction_test"][0,index]].decode("utf-8") +  "\" picture.")

Let's also plot the cost function and the gradients.

复制代码

 # Plot learning curve (with costs)

    
 costs = np.squeeze(d['costs'])
    
 plt.plot(costs)
    
 plt.ylabel('cost')
    
 plt.xlabel('iterations (per hundreds)')
    
 plt.title("Learning rate =" + str(d["learning_rate"]))
    
 plt.show()

Interpretation : Observing the cost decrease, we can notice that our model is learning its parameters effectively. However, we notice that further training on the training dataset can still improve performance. To enhance this, we suggest increasing the number of epochs in the code block above and rerunning to evaluate performance improvements. This phenomenon, where training accuracy increases but test accuracy decreases, is known as overfitting.

6 - Further analysis (optional/ungraded exercise)

Congratulations on the successful creation of your first image classification model. Let us conduct a thorough examination of this model and evaluate various options for the learning rate $α$ .

Choice of learning rate

提醒：为了使梯度下降有效运行，请明智地选择学习率。参数更新的速度由学习率αα决定。如果学习率过大，则可能导致我们超过最优值；同样地，在学习率过低的情况下，则需要大量的迭代才能收敛到最佳值。因此，在优化过程中必须确保我们能够找到最佳的学习速率。

Compare our model's learning curve across various learning rate settings. Execute the code block below, which should take approximately one minute. experiment with alternative learning rate values outside those initially set in the learning_rates variable and observe any changes.

复制代码

 learning_rates = [0.01, 0.001, 0.0001]

    
 models = {}
    
 for i in learning_rates:
    
     print ("learning rate is: " + str(i))
    
     models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1500, learning_rate = i, print_cost = False)
    
     print ('\n' + "-------------------------------------------------------" + '\n')
    
  
    
 for i in learning_rates:
    
     plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["learning_rate"]))
    
  
    
 plt.ylabel('cost')
    
 plt.xlabel('iterations (hundreds)')
    
  
    
 legend = plt.legend(loc='upper center', shadow=True)
    
 frame = legend.get_frame()
    
 frame.set_facecolor('0.90')
    
 plt.show()

Interpretation :

Different learning rates yield distinct costs and consequently varying prediction outcomes.
When the learning rate is set too high (e.g., 0.01), the cost may oscillate up and down, potentially diverging (though in this case, using 0.01 still eventually leads to a satisfactory cost value).
A lower cost does not necessarily indicate a superior model; it's essential to verify for potential overfitting, which typically occurs when training accuracy significantly exceeds test accuracy.
In deep learning practice:
- Optimize the learning rate to effectively minimize the cost function.
- If your model exhibits signs of overfitting, consider employing other techniques to reduce overfitting.
- We will delve deeper into these methods in subsequent lessons.

7 - Test with your own image (optional/ungraded exercise)

Congratulations on completing this assignment. You are now able to utilize your own image and observe the output generated by your model. To proceed with this functionality, please refer to the following steps:

复制代码

 1. Click on "File" in the upper bar of this notebook, then click "Open" to go on your Coursera Hub.

    
 2. Add your image to this Jupyter Notebook's directory, in the "images" folder
    
 3. Change your image's name in the following code
    
 4. Run the code and check if the algorithm is right (1 = cat, 0 = non-cat)!

复制代码

 ## START CODE HERE ## (PUT YOUR IMAGE NAME)

    
 my_image = "my_image.jpg"   # change this to the name of your image file 
    
 ## END CODE HERE ##
    
  
    
 # We preprocess the image to fit your algorithm.
    
 fname = "images/" + my_image
    
 image = np.array(ndimage.imread(fname, flatten=False))
    
 image = image/255.
    
 my_image = scipy.misc.imresize(image, size=(num_px,num_px)).reshape((1, num_px*num_px*3)).T
    
 my_predicted_image = predict(d["w"], d["b"], my_image)
    
  
    
 plt.imshow(image)
    
 print("y = " + str(np.squeeze(my_predicted_image)) + ", your algorithm predicts a \"" + classes[int(np.squeeze(my_predicted_image)),].decode("utf-8") +  "\" picture.")

What to remember from this assignment:

数据预处理是机器学习流程中的关键环节。
你分别实现了初始化函数（initialize()）、传播函数（propagate()）以及优化函数（optimize()）。随后你构建了完整的机器学习模型。
调整学习率（作为超参数的一个实例）会对算法的表现产生显著影响。在后续课程中你将会遇到更多类似的操作。

Lastly, if interested, we invite you to attempt various activities on this Notebook. Please ensure that all submissions are made before attempting any experiments. Once all submissions have been received, the materials and substances available for play include:

复制代码

 - Play with the learning rate and the number of iterations

    
 - Try different initialization methods and compare the results
    
 - Test other preprocessings (center the data, or divide each row by its standard deviation)

Bibliography:

全部评论 (0)

还没有任何评论哟~

Neural Networks and Deep Learning week2 Logistic Regression with a Neural Network mindset

该实验作业的主要目的 1.通过该实验指导完成对猫图片的识别 2.在此过程中需要a初始化参量b计算代价函数c使用连续梯度算法你可能对以下参考文献感兴趣 <http://www.wildml.com/2...

Logistic Regression with a Neural Network mindset

LogisticRegressionwithaNeuralNetworkmindset 2018年9月30日20点27分终于花了一个月做完了这一章的习题，辛苦辛苦了历经磨难，终于实现了这个例子。

第一课:Neural Networks and Deep Learning 第二周：编程作业 Logistic Regression with a Neural Network mindset

第一课:NeuralNetworksandDeepLearning 第二周：编程作业LogisticRegressionwithaNeuralNetworkmindset 本周课程笔记见：第二周：神经...

深度学习作业L1W2（2）：Logistic Regression with a Neural Network mindset v4

实验主要目的是构建一个logistic回归模型完成一个二分类，判断一张图片是不是猫。简要介绍以下重点流程预处理数据在本次实验中，我们的训练集由（209，64，64，3）的四维矩阵构成。

cs230 深度学习 Lecture 2 编程作业： Logistic Regression with a Neural Network mindset

本文结构： 1.将Logistic表达为神经网络的形式 2.构建模型 1.导入包 2.获得数据 3.并进行预处理：格式转换，归一化 4.整合模型： A.构建模型 a.初始化参数：w和b为0 b.前向传...

Neural Networks and Deep Learnging week4 Deep Neural Network - Application

该实验作业的主要目的 1.将刚刚完成的神经网络模块，组合到一起，并进行对猫片的分类你可能感兴趣的文献 <http://stackoverflow.com/questions/1907993/auto...

Neural Networks and Deep Learning week3 Planar data classification with a hidden layer

该实验作业的主要目的 1.实现一个二元分类（多元分类就是分为目标类和非目标类两类） 2.使用非线性激活函数（希望你还记得为什么，在哪里使用什么函数的原因） 3.实现前向传播和后向传播你可能对以下文献...

Logistic Regression with a Neural Network mindset编程作业笔记『吴恩达神经网络和深度学习 DeepLearning-WEEK2』

暂无描述

Simultaneous Feature Learning and Hash Coding with Deep Neural Networks

SimultaneousFeatureLearningandHashCodingwithDeepNeuralNetworks 论文下载地址自从2014年中山大学潘炎老师讲deephash搬上舞台以来...

Neural Networks and Deep Learning 2

本章我做的矩阵乘法的地方与这本书使用相同，不过我所有向量都是列向量 Ch02Howthebackpropagationalgorithmworks 在线书籍http://neuralnetworksa...

是否确定退出登录?

Neural Networks and Deep Learning week2 Logistic Regression with a Neural Network mindset

Logistic Regression with a Neural Network mindset

Updates

1 - Packages

2 - Overview of the Problem set

3 - General Architecture of the learning algorithm

4 - Building the parts of our algorithm

4.1 - Helper functions

4.2 - Initializing parameters

4.3 - Forward and Backward propagation

4.4 - Optimization

5 - Merge all functions into a model

6 - Further analysis (optional/ungraded exercise)

7 - Test with your own image (optional/ungraded exercise)

全部评论 (0)

相关文章推荐

Neural Networks and Deep Learning week2 Logistic Regression with a Neural Network mindset

Logistic Regression with a Neural Network mindset

第一课:Neural Networks and Deep Learning 第二周：编程作业 Logistic Regression with a Neural Network mindset

深度学习作业L1W2（2）：Logistic Regression with a Neural Network mindset v4

cs230 深度学习 Lecture 2 编程作业： Logistic Regression with a Neural Network mindset

Neural Networks and Deep Learnging week4 Deep Neural Network - Application

Neural Networks and Deep Learning week3 Planar data classification with a hidden layer

Logistic Regression with a Neural Network mindset编程作业笔记『吴恩达神经网络和深度学习 DeepLearning-WEEK2』

Simultaneous Feature Learning and Hash Coding with Deep Neural Networks

Neural Networks and Deep Learning 2