Neural Networks and Deep Learning week2 Logistic Regression with a Neural Network mindset
该实验作业的主要目的
- 该实验指导实现了对图像的分类识别。
- 在该过程中涉及了对参数a的设置、针对损失函数b进行评估以及采用连续梯度优化过程c。
你可能对以下参考文献感兴趣
http://www.wildml.com/2015/09/implementing-a-neural-network-from-scratch/ 这篇文章向您详细介绍了从零构建神经网络的方法,并提供了更加有效地帮助您深入理解神经网络细节的技术途径。访问GitHub上的代码仓库以获取完整的实现细节和实践指导。
这个平台旨在助于你了解我们之前对图片进行标准化背后的动机与目的
总体进程
载入文件
2加载数据集,并进行大小预处理
构建算法的基本流程包括设置模型初始参数、训练模型以优化参数、利用优化后的参数进行推断以及评估预测结果并总结发现。
现在开始我们搭建自己的神经网络
4.1 辅助函数 sigmoid
4.2 初始化模型参数 initialize_with_zeros(dim)
4.3 前向与反向传播 propagate(w, b, X, Y) 需要注意的是,在本阶段中我们仅使用一层神经网络(不含隐藏层),因此将前向传播与反向传播合并处理。通常情况下,则会分别进行处理;这将在下周或下下周的代码实现中得以体现
4.4 训练参数optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False)
4.4 预测结果 predict(w, b, X)
融合神经网络的过程 model(X_train, Y_train, X_test, Y_test) 经过 2000 次迭代的学习率设为 0.5 并且不再打印成本信息
你现在已经完成了作业。旨在更深入地掌握这一概念,请让我们首先聚焦于梯度下降的过程。如同我们在机器学习课程中所学的一样
更改我们的学习率,当太大会出现折叠,太小下降缓慢
测试我们自己的其他图片
在需要写代码的前面我将提示任务目标
Logistic Regression with a Neural Network mindset
Welcome to your first (mandatory) programming coursework! You are tasked with constructing a logistic regression classifier capable of identifying cat images. This coursework will guide you through the process of implementing this using a neural network perspective, while also fostering an intuitive understanding of deep learning concepts.
Instructions:
It is imperative that you avoid using loops (for/while) in your code, except when the requirements explicitly demand it.
You will learn to:
- 构建学习算法的一般架构,并包括以下内容:
- 参数初始化
- 计算成本函数及其梯度
- 采用优化算法(如梯度下降)
- 将上述三个步骤整合到主模型函数中,并按正确的顺序排列。
Updates
This notebook has been updated in recent months. The notebook previously went by version v5, which has since been replaced with version 6a.
If you were working on a previous version:
- Locate prior work within the file system requires examining directories containing older files, each identified by their version names.
- Accessing the file directory involves clicking on the "Coursera" icon located at the top-left corner of this notebook.
- Submit assignments must be transferred from previous versions to ensure they are up-to-date before submission.
List of Updates
- The forward computation process has been adjusted to begin indexing with 1 rather than 0.
- In optimization function comments, the instruction has been updated to "output the cost value each time after completing a hundred training examples" instead of "examples".
- Grammar issues within the comments section have been corrected.
- The Y_prediction_test variable name has been consistently utilized across all code sections.
- The plot's axis label has been revised to indicate "iteration count (hundreds)" rather than simply "iterations".
- During testing, images are standardized by dividing each pixel value by 255.
1 - Packages
Initially, let us execute the code in the cell below to import all the necessary packages for this assignment.
- numpy plays a vital role in scientific computing using Python.
- h5py enables users to work with data stored in H5 files.
- matplotlib offers comprehensive tools for generating publication-quality graphs in Python.
- The combination of PIL and scipy facilitates testing models using personal images.
import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
from lr_utils import load_dataset
%matplotlib inline
2 - Overview of the Problem set
Problem Statement : You are given a dataset ("data.h5") containing:
- a training set of m_train images labeled as cat (y=1) or non-cat (y=0)
- a test set of m_test images labeled as cat or non-cat
- each image is of shape (num_px, num_px, 3) where 3 is for the 3 channels (RGB). Thus, each image is square (height = num_px) and (width = num_px).
One will assemble a straightforward image recognition system designed to accurately categorize images into two categories: cats and non-cats.
Let's explore the dataset to gain a deeper understanding. Access the data by executing the code provided below.
# Loading the data (cat/non-cat)
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
We included '_orig' at the end of image datasets (train and test) since we are going to preprocess them. After preprocessing, we will end up with train_set_x and test_set_x (the labels train_set_y and test_set_y do not require any preprocessing).
Each line in your train_set_x_orig and test_set_x_orig represents an image as an array. You are able to visualize a sample by executing the provided code. Please also feel free to adjust the index value and re-run the code to view additional images.
# Example of a picture
index = 25
plt.imshow(train_set_x_orig[index])
print ("y = " + str(train_set_y[:, index]) + ", it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") + "' picture.")
A number of software bugs in deep learning result from mismatched matrix/vector dimensions. Properly maintaining accurate matrix and vector dimensions is key to avoiding many programming errors. By consistently upholding these dimensional requirements, you can significantly reduce the likelihood of encountering such issues.
Exercise: Find the values for:
- m_train (number of training examples)
- m_test (number of test examples)
- num_px (= height = width of a training image)
Please note that train_set_x_orig constitutes a numpy-array having the shape (m_train, num_px, num_px, 3). for example, you can acquire m_train by using train_set_x_orig.shape[0]. Note that all variables and expressions are represented using their respective notations.
完成m_train、m_test、num_px的参数提取
提示关于train_set_x_orig中的数据结构
提示关于train_set_x_orig中的数据结构
### START CODE HERE ### (≈ 3 lines of code)
m_train =train_set_x_orig.shape[0]
m_test = test_set_x_orig.shape[0]
num_px = train_set_x_orig.shape[1]
### END CODE HERE ###
print ("Number of training examples: m_train = " + str(m_train))
print ("Number of testing examples: m_test = " + str(m_test))
print ("Height/Width of each image: num_px = " + str(num_px))
print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)")
print ("train_set_x shape: " + str(train_set_x_orig.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x shape: " + str(test_set_x_orig.shape))
print ("test_set_y shape: " + str(test_set_y.shape))
Expected Output for m_train, m_test and num_px :
| m_train | 209 |
|---|---|
| m_test | 50 |
| num_px | 64 |
For the sake of efficiency, you now require reshaping images with an original shape of (num_px, num_px, 3) into a numpy-array with dimensions (num_px ** num_px ** 3, 1). Once reshaped, the training and test datasets will be structured as numpy-arrays where each column corresponds to a flattened image. Each of these datasets will contain m_train (respectively m_test) columns.
Reshape the training and testing datasets in order to reorganize the images with dimensions (num_px × num_px × 3) into single vectors each with a shape of (num_px × num_px × 3, 1).
A method when you desire to flatten a matrix X with shape (a,b,c,d) into a matrix X_flatten of shape (b×c×d, a) is to employ:
X_flatten = X.reshape(X.shape[0], -1).T # X.T is the transpose of X
完成对原始数据的重构
提示:为了将X(a, b, c, d)重塑为X(b×c×d, a),可以通过该方法进行操作:X_flatten = X.reshape(X.shape[0], -1).T
# Reshape the training and test examples
### START CODE HERE ### (≈ 2 lines of code)
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0],-1).T
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0],-1).T
### END CODE HERE ###
print ("train_set_x_flatten shape: " + str(train_set_x_flatten.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x_flatten shape: " + str(test_set_x_flatten.shape))
print ("test_set_y shape: " + str(test_set_y.shape))
print ("sanity check after reshaping: " + str(train_set_x_flatten[0:5,0]))
Expected Output :
| train_set_x_flatten shape | (12288, 209) |
|---|---|
| train_set_y shape | (1, 209) |
| test_set_x_flatten shape | (12288, 50) |
| test_set_y shape | (1, 50) |
| sanity check after reshaping | [17 31 56 22 33] |
Representing color images requires specifying the RGB channels for each pixel. Since this results in each pixel being represented by a triplet of values with a range from 0 to 255.
A standard pre-processing technique in machine learning involves normalizing your dataset by subtracting the mean of the entire numpy array from each data instance and then dividing each instance by the standard deviation of the array. For image datasets, a straightforward approach is to normalize each row by dividing it by 255, which corresponds to the maximum value of a pixel channel.
Let's standardize our dataset.
train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255.
What you need to remember:
Common steps for pre-processing a new dataset are:
- Determine the size and configuration of the problem's dimensions and shapes using variables like m_train, m_test, and num_px.
- Reorganize each dataset so that every example becomes a vector with dimensions (num_px × num_px × 3) in size.
- Standardize the data.
3 - General Architecture of the learning algorithm
Now is the time to develop a straightforward method for differentiating cat images from non-cat images.
You are tasked with constructing a logistic regression model from the perspective of a neural network approach. The accompanying figure demonstrates why logistic regression constitutes such a basic structure within neural networks.

Mathematical expression of the algorithm :
For one example x(i)x(i):

(1)

(2)

The cost is then computed by summing over all training examples:

Key steps : In this exercise, you will carry out the following steps:
- Initialize the parameters of the model
- Learn the parameters for the model by minimizing the cost
- Use the learned parameters to make predictions (on the test set)
- Analyse the results and conclude
4 - Building the parts of our algorithm
The main steps for building a Neural Network are:
- 确定模型结构(例如输入特征的数量)
- 设置模型参数的初始值
- 循环:
- 通过前向传播计算当前损失值
- 通过反向传播计算当前梯度
- 通过梯度下降更新参数
Typically, you build items 1 through 3 individually and incorporate them into a unified function referred to as model(). This approach allows for streamlined functionality within the system.
4.1 - Helper functions
Exercise : Using the code from 'Python Basics', write a sigmoid() function. To complete this exercise, you should compute...

to make predictions. Use np.exp().
完成sigmoid函数 提示 使用np.exp
完成上一个实验的你应该很了解了
# GRADED FUNCTION: sigmoid
def sigmoid(z):
"""
Compute the sigmoid of z
7. Arguments:
z -- A scalar or numpy array of any size.
10. Return:
s -- sigmoid(z)
"""
### START CODE HERE ### (≈ 1 line of code)
s = 1/(1+np.exp(-z))
### END CODE HERE ###
return s
print ("sigmoid([0, 2]) = " + str(sigmoid(np.array([0,2]))))
Expected Output :
| sigmoid([0, 2]) | [ 0.5 0.88079708] |
|---|
4.2 - Initializing parameters
Exercise: It is necessary for you to implement parameter initialization within this cell. It is imperative that you initialize the variable w as a zero vector. If you are unsure about which numpy function to employ, refer to the documentation for np.zeros().
完成参数的初始化工作,并假设你仍需关注参数的具体数值。若你遗忘或不确定其中细节,请参考绿色代码部分已经明确标注的内容。在此实验过程中,请暂且忽略系统对称性特性的问题。
# GRADED FUNCTION: initialize_with_zeros
def initialize_with_zeros(dim):
"""
This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
Argument:
dim -- size of the w vector we want (or number of parameters in this case)
Returns:
w -- initialized vector of shape (dim, 1)
b -- initialized scalar (corresponds to the bias)
"""
### START CODE HERE ### (≈ 1 line of code)
w = np.zeros((dim,1))
b = 0
### END CODE HERE ###
assert(w.shape == (dim, 1))
assert(isinstance(b, float) or isinstance(b, int))
return w, b
dim = 2
w, b = initialize_with_zeros(dim)
print ("w = " + str(w))
print ("b = " + str(b))
Expected Output :
| w | [[ 0.] [ 0.]] |
|---|---|
| b | 0 |
For image inputs, w will be of shape (num_px ×× num_px ×× 3, 1).
4.3 - Forward and Backward propagation
Once your variables have been initialized, you can perform the forward and backward propagation procedures for learning the model’s weights.
Exercise: Write a code block named propagate() to compute the cost function and its gradient.
Hints :
Forward Propagation:
- You get X
- You compute

- You calculate the cost function:

Here are the two formulas you will be using:


完成代价函数的计算J,后向计算结果dw,db
在这一过程中,在该过程中的代价函数J已经被建立完成;为了便于后续操作,请引入前向计算结果A
# GRADED FUNCTION: propagate
def propagate(w, b, X, Y):
"""
Implement the cost function and its gradient for the propagation explained above
7. Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)
13. Return:
cost -- negative log-likelihood cost for logistic regression
dw -- gradient of the loss with respect to w, thus same shape as w
db -- gradient of the loss with respect to b, thus same shape as b
Tips:
- Write your code step by step for the propagation. np.log(), np.dot()
"""
m = X.shape[1]
# FORWARD PROPAGATION (FROM X TO COST)
### START CODE HERE ### (≈ 2 lines of code)
A = sigmoid(np.dot(w.T,X)+b) # compute activation
cost = -1/m*np.sum(Y*np.log(A)+(1-Y)*np.log(1-A)) # compute cost
### END CODE HERE ###
# BACKWARD PROPAGATION (TO FIND GRAD)
### START CODE HERE ### (≈ 2 lines of code)
dw = 1/m*np.dot(X,(A-Y).T)
db = 1/m*np.sum(A-Y)
### END CODE HERE ###
assert(dw.shape == w.shape)
assert(db.dtype == float)
cost = np.squeeze(cost)
assert(cost.shape == ())
grads = {"dw": dw,
"db": db}
return grads, cost
w, b, X, Y = np.array([[1.],[2.]]), 2., np.array([[1.,2.,-1.],[3.,4.,-3.2]]), np.array([[1,0,1]])
grads, cost = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))
Expected Output :
| dw | [[ 0.99845601] [ 2.39507239]] |
|---|---|
| db | 0.00145557813678 |
| cost | 5.801545319394553 |
4.4 - Optimization
- You already initialized your parameters.
- It is possible for you to compute both a cost function and its gradient.
- To proceed, you should update the parameters through the process of gradient descent.
Exercise: 完成优化函数的推导。目标是通过最小化成本函数JJ来学习参数ww和bb。对于参数θθ而言,更新规则为θ = θ − α dθ,请问这是什么意思?其中αα代表学习率。
完成向量化后的循环计算
注意之前已经完成的函数propagate(w, b, X, Y)
复习我们的神经网络过程时,首先进行正向传播,计算目标函数J;随后进行反向传播调整模型参数;重复上述步骤计数器次以完成训练
# GRADED FUNCTION: optimize
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
"""
This function optimizes w and b by running a gradient descent algorithm
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of shape (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)
num_iterations -- number of iterations of the optimization loop
learning_rate -- learning rate of the gradient descent update rule
print_cost -- True to print the loss every 100 steps
Returns:
params -- dictionary containing the weights w and bias b
grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.
Tips:
You basically need to write down two steps and iterate through them:
1) Calculate the cost and the gradient for the current parameters. Use propagate().
2) Update the parameters using gradient descent rule for w and b.
"""
costs = []
for i in range(num_iterations):
# Cost and gradient calculation (≈ 1-4 lines of code)
### START CODE HERE ###
grads, cost = propagate(w,b,X,Y)
### END CODE HERE ###
# Retrieve derivatives from grads
dw = grads["dw"]
db = grads["db"]
# update rule (≈ 2 lines of code)
### START CODE HERE ###
w = w-learning_rate*dw
b = b-learning_rate*db
### END CODE HERE ###
# Record the costs
if i % 100 == 0:
costs.append(cost)
# Print the cost every 100 training iterations
if print_cost and i % 100 == 0:
print ("Cost after iteration %i: %f" %(i, cost))
params = {"w": w,
"b": b}
grads = {"dw": dw,
"db": db}
return params, grads, costs
params, grads, costs = optimize(w, b, X, Y, num_iterations= 100, learning_rate = 0.009, print_cost = False)
print ("w = " + str(params["w"]))
print ("b = " + str(params["b"]))
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
Expected Output :
| w | [[ 0.19033591] [ 0.12259159]] |
|---|---|
| b | 1.92535983008 |
| dw | [[ 0.67752042] [ 1.41625495]] |
| db | 0.219194504541 |
Exercise: The following code block will output the learned weights (w) and biases (b). We can use these weights and biases to predict class labels for a given dataset X. Implementing the predict() function is straightforward. Predictions are computed in two distinct steps:
Calculate

Replace each entry of vector \mathbf{a} with 0 if its activation value is \leq 0.5, otherwise assign it 1. The predicted values are then stored in the vector \mathbf{Y\_prediction}. For your convenience, an if-else statement can be implemented within a for-loop (though another approach involves vectorization).
神经网络已经训练完成,现在我们需要用它对图片进行预测
# GRADED FUNCTION: predict
def predict(w, b, X):
'''
Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Returns:
Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
'''
m = X.shape[1]
Y_prediction = np.zeros((1,m))
w = w.reshape(X.shape[0], 1)
# Compute vector "A" predicting the probabilities of a cat being present in the picture
### START CODE HERE ### (≈ 1 line of code)
A = sigmoid(np.dot(w.T,X)+b)
### END CODE HERE ###
for i in range(A.shape[1]):
# Convert probabilities A[0,i] to actual predictions p[0,i]
### START CODE HERE ### (≈ 4 lines of code)
if A[0,i]<0.5:
Y_prediction[0,i]=0
else:
Y_prediction[0,i]=1
### END CODE HERE ###
assert(Y_prediction.shape == (1, m))
return Y_prediction
w = np.array([[0.1124579],[0.23106775]])
b = -0.3
X = np.array([[1.,-1.1,-3.2],[1.2,2.,0.1]])
print ("predictions = " + str(predict(w, b, X)))
Expected Output :
| predictions | [[ 1. 1. 0.]] |
|---|
What to remember: You've implemented several functions that:
- Set initial values for (w, b)
- Iteratively minimize the cost function to determine optimal values for both w and b:
- Calculate both the cost and its gradient with respect to w and b
- Update w and b using gradient descent method
- Employ these learned parameters to predict labels for new data instances
5 - Merge all functions into a model
You now have the ability to understand how the entire model is organized by assembling all of its building blocks (functions developed in earlier sections) into a coherent structure.
Exercise: Implement the model function. Use the following notation:
- Y_prediction_test for your predictions on the test set
- Y_prediction_train for your predictions on the train set
- w, costs, grads for the outputs of optimize()
现在我们已经完成了所有模块,需要将他们整合到一起了
有 初始化,训练,预测,注意训练的目的是的到参数
Initialize weights as zero vector initialize_with_zeros(dim). Optimize parameters using gradient descent optimize(w,b,X,Y,num_iterations.learning_rate(print_cost=False). Call the prediction method predict(w,b,X).
# GRADED FUNCTION: model
def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
"""
Builds the logistic regression model by calling the function you've implemented previously
Arguments:
X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
print_cost -- Set to true to print the cost every 100 iterations
Returns:
d -- dictionary containing information about the model.
"""
### START CODE HERE ###
# initialize parameters with zeros (≈ 1 line of code)
w, b = initialize_with_zeros(X_train.shape[0])
# Gradient descent (≈ 1 line of code)
parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
# Retrieve parameters w and b from dictionary "parameters"
w = parameters["w"]
b = parameters["b"]
# Predict test/train set examples (≈ 2 lines of code)
Y_prediction_test = predict(w, b, X_test)
Y_prediction_train = predict(w, b, X_train)
### END CODE HERE ###
# Print train/test Errors
print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))
d = {"costs": costs,
"Y_prediction_test": Y_prediction_test,
"Y_prediction_train" : Y_prediction_train,
"w" : w,
"b" : b,
"learning_rate" : learning_rate,
"num_iterations": num_iterations}
return d
Run the following cell to train your model.
d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)
Expected Output :
| Cost after iteration 0 | 0.693147 |
|---|---|
| ⋮⋮ | ⋮⋮ |
| Train Accuracy | 99.04306220095694 % |
| Test Accuracy | 70.0 % |
Comment : 训练集的准确率达到99.9%左右。这是一个良好的基础检验:你的模型运行良好,并且具备足够的能力来拟合训练数据。测试集的准确率为68%,这其实并不算差劲——考虑到我们使用了规模较小的数据集以及所采用的是线性分类器(逻辑回归)的缘故。但无需担心,在接下来的一周里你将构建出一个更强大的分类器。
Additionally, you observe that the model exhibits clear signs of overfitting during training. Later in this specialization, you will be taught how to mitigate overfitting, such as through regularization techniques. By utilizing the provided code (and adjusting the index variable), examine predictions made on images from the test set.
# Example of a picture that was wrongly classified.
index = 1
plt.imshow(test_set_x[:,index].reshape((num_px, num_px, 3)))
print ("y = " + str(test_set_y[0,index]) + ", you predicted that it is a \"" + classes[d["Y_prediction_test"][0,index]].decode("utf-8") + "\" picture.")
Let's also plot the cost function and the gradients.
# Plot learning curve (with costs)
costs = np.squeeze(d['costs'])
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(d["learning_rate"]))
plt.show()
Interpretation : Observing the cost decrease, we can notice that our model is learning its parameters effectively. However, we notice that further training on the training dataset can still improve performance. To enhance this, we suggest increasing the number of epochs in the code block above and rerunning to evaluate performance improvements. This phenomenon, where training accuracy increases but test accuracy decreases, is known as overfitting.
6 - Further analysis (optional/ungraded exercise)
Congratulations on the successful creation of your first image classification model. Let us conduct a thorough examination of this model and evaluate various options for the learning rate α.
Choice of learning rate
提醒:为了使梯度下降有效运行,请明智地选择学习率。参数更新的速度由学习率αα决定。如果学习率过大,则可能导致我们超过最优值;同样地,在学习率过低的情况下,则需要大量的迭代才能收敛到最佳值。因此,在优化过程中必须确保我们能够找到最佳的学习速率。
Compare our model's learning curve across various learning rate settings. Execute the code block below, which should take approximately one minute. experiment with alternative learning rate values outside those initially set in the learning_rates variable and observe any changes.
learning_rates = [0.01, 0.001, 0.0001]
models = {}
for i in learning_rates:
print ("learning rate is: " + str(i))
models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1500, learning_rate = i, print_cost = False)
print ('\n' + "-------------------------------------------------------" + '\n')
for i in learning_rates:
plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["learning_rate"]))
plt.ylabel('cost')
plt.xlabel('iterations (hundreds)')
legend = plt.legend(loc='upper center', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()
Interpretation :
- Different learning rates yield distinct costs and consequently varying prediction outcomes.
- When the learning rate is set too high (e.g., 0.01), the cost may oscillate up and down, potentially diverging (though in this case, using 0.01 still eventually leads to a satisfactory cost value).
- A lower cost does not necessarily indicate a superior model; it's essential to verify for potential overfitting, which typically occurs when training accuracy significantly exceeds test accuracy.
- In deep learning practice:
- Optimize the learning rate to effectively minimize the cost function.
- If your model exhibits signs of overfitting, consider employing other techniques to reduce overfitting.
- We will delve deeper into these methods in subsequent lessons.
7 - Test with your own image (optional/ungraded exercise)
Congratulations on completing this assignment. You are now able to utilize your own image and observe the output generated by your model. To proceed with this functionality, please refer to the following steps:
1. Click on "File" in the upper bar of this notebook, then click "Open" to go on your Coursera Hub.
2. Add your image to this Jupyter Notebook's directory, in the "images" folder
3. Change your image's name in the following code
4. Run the code and check if the algorithm is right (1 = cat, 0 = non-cat)!
## START CODE HERE ## (PUT YOUR IMAGE NAME)
my_image = "my_image.jpg" # change this to the name of your image file
## END CODE HERE ##
# We preprocess the image to fit your algorithm.
fname = "images/" + my_image
image = np.array(ndimage.imread(fname, flatten=False))
image = image/255.
my_image = scipy.misc.imresize(image, size=(num_px,num_px)).reshape((1, num_px*num_px*3)).T
my_predicted_image = predict(d["w"], d["b"], my_image)
plt.imshow(image)
print("y = " + str(np.squeeze(my_predicted_image)) + ", your algorithm predicts a \"" + classes[int(np.squeeze(my_predicted_image)),].decode("utf-8") + "\" picture.")
What to remember from this assignment:
- 数据预处理是机器学习流程中的关键环节。
- 你分别实现了初始化函数(initialize())、传播函数(propagate())以及优化函数(optimize())。随后你构建了完整的机器学习模型。
- 调整学习率(作为超参数的一个实例)会对算法的表现产生显著影响。在后续课程中你将会遇到更多类似的操作。
Lastly, if interested, we invite you to attempt various activities on this Notebook. Please ensure that all submissions are made before attempting any experiments. Once all submissions have been received, the materials and substances available for play include:
- Play with the learning rate and the number of iterations
- Try different initialization methods and compare the results
- Test other preprocessings (center the data, or divide each row by its standard deviation)
Bibliography:
