Advertisement

Deep Learning_Autonomous driving application - Car detection_自动驾驶代码实现

阅读量:

Autonomous driving - Car detection

Welcome to your week 3 programming assignment. You will learn about object detection using the very powerful YOLO model. Many of the ideas in this notebook are described in the two YOLO papers: Redmon et al., 2016 (https://arxiv.org/abs/1506.02640) and Redmon and Farhadi, 2016 (https://arxiv.org/abs/1612.08242).

You will learn to :

  • Use object detection on a car detection dataset
  • Deal with bounding boxes

Run the following cell to load the packages and dependencies that are going to be useful for your journey!

复制代码
 import argparse

    
 import os
    
 import matplotlib.pyplot as plt
    
 from matplotlib.pyplot import imshow
    
 import scipy.io
    
 import scipy.misc
    
 import numpy as np
    
 import pandas as pd
    
 import PIL
    
 import tensorflow as tf
    
 from keras import backend as K
    
 from keras.layers import Input, Lambda, Conv2D
    
 from keras.models import load_model, Model
    
 from yolo_utils import read_classes, read_anchors, generate_colors, preprocess_image, draw_boxes, scale_boxes
    
 from yad2k.models.keras_yolo import yolo_head, yolo_boxes_to_corners, preprocess_true_boxes, yolo_loss, yolo_body
    
  
    
 %matplotlib inline

1 - Problem Statement

You are working on a self-driving car. As a critical component of this project, you'd like to first build a car detection system. To collect data, you've mounted a camera to the hood (meaning the front) of the car, which takes pictures of the road ahead every few seconds while you drive around.

2 - YOLO

YOLO ("you only look once") is a popular algoritm because it achieves high accuracy while also being able to run in real-time. This algorithm "only looks once" at the image in the sense that it requires only one forward propagation pass through the network to make predictions. After non-max suppression, it then outputs recognized objects together with the bounding boxes.

2.1 - Model details

First things to know:

  • The input is a batch of images of shape (m, 608, 608, 3)
  • The output is a list of bounding boxes along with the recognized classes. Each bounding box is represented by 6 numbers as explained above. If you expand into an 80-dimensional vector, each bounding box is then represented by 85 numbers.

We will use 5 anchor boxes. So you can think of the YOLO architecture as the following: IMAGE (m, 608, 608, 3) -> DEEP CNN -> ENCODING (m, 19, 19, 5, 85).

Lets look in greater detail at what this encoding represents.

2.2 - Filtering with a threshold on class scores

You are going to apply a first filter by thresholding. You would like to get rid of any box for which the class "score" is less than a chosen threshold.

The model gives you a total of 19x19x5x85 numbers, with each box described by 85 numbers. It'll be convenient to rearrange the (19,19,5,85) (or (19,19,425)) dimensional tensor into the following variables:

  • box_confidence: tensor of shape containing (confidence probability that there's some object) for each of the 5 boxes predicted in each of the 19x19 cells.
  • boxes: tensor of shape containing for each of the 5 boxes per cell.
  • box_class_probs: tensor of shape containing the detection probabilities for each of the 80 classes for each of the 5 boxes per cell.

Exercise : Implement yolo_filter_boxes().

  1. Compute box scores by doing the elementwise product as described in Figure 4. The following code may help you choose the right operator:
复制代码
        1. a = np.random.randn(19*19, 5, 1)

    
        2. b = np.random.randn(19*19, 5, 80)
    
        3. c = a * b # shape of c will be (19*19, 5, 80)
  1. For each box, find:
    • the index of the class with the maximum box score (Hint) (Be careful with what axis you choose; consider using axis=-1)
    • the corresponding box score (Hint) (Be careful with what axis you choose; consider using axis=-1)
  2. Create a mask by using a threshold. As a reminder: ([0.9, 0.3, 0.4, 0.5, 0.1] < 0.4) returns: [False, True, False, False, True]. The mask should be True for the boxes you want to keep.
  3. Use TensorFlow to apply the mask to box_class_scores, boxes and box_classes to filter out the boxes we don't want. You should be left with just the subset of boxes you want to keep. (Hint)

Reminder: to call a Keras function, you should use K.function(...).

复制代码
 # GRADED FUNCTION: yolo_filter_boxes

    
  
    
 def yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = .6):
    
     """Filters YOLO boxes by thresholding on object and class confidence.
    
     
    
     Arguments:
    
     box_confidence -- tensor of shape (19, 19, 5, 1)
    
     boxes -- tensor of shape (19, 19, 5, 4)
    
     box_class_probs -- tensor of shape (19, 19, 5, 80)
    
     threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box
    
     
    
     Returns:
    
     scores -- tensor of shape (None,), containing the class probability score for selected boxes
    
     boxes -- tensor of shape (None, 4), containing (b_x, b_y, b_h, b_w) coordinates of selected boxes
    
     classes -- tensor of shape (None,), containing the index of the class detected by the selected boxes
    
     
    
     Note: "None" is here because you don't know the exact number of selected boxes, as it depends on the threshold. 
    
     For example, the actual output size of scores would be (10,) if there are 10 boxes.
    
     """
    
     
    
     # Step 1: Compute box scores
    
     ### START CODE HERE ### (≈ 1 line)
    
     box_scores = box_confidence*box_class_probs
    
     ### END CODE HERE ###
    
     
    
     # Step 2: Find the box_classes thanks to the max box_scores, keep track of the corresponding score
    
     ### START CODE HERE ### (≈ 2 lines)
    
     box_classes = K.argmax(box_scores,axis=-1) 
    
     box_class_scores = K.max(box_scores, axis=-1)
    
     ### END CODE HERE ###
    
     
    
     # Step 3: Create a filtering mask based on "box_class_scores" by using "threshold". The mask should have the
    
     # same dimension as box_class_scores, and be True for the boxes you want to keep (with probability >= threshold)
    
     ### START CODE HERE ### (≈ 1 line)
    
     filtering_mask = box_class_scores>=threshold 
    
     ### END CODE HERE ###
    
     
    
     # Step 4: Apply the mask to scores, boxes and classes
    
     ### START CODE HERE ### (≈ 3 lines)
    
     scores = tf.boolean_mask(box_class_scores,filtering_mask)
    
     boxes = tf.boolean_mask(boxes,filtering_mask)
    
     classes = tf.boolean_mask(box_classes,filtering_mask)
    
     ### END CODE HERE ###
    
     
    
     return scores, boxes, classes
复制代码
 tf.reset_default_graph()

    
  
    
 with tf.Session() as test:
    
     tf.set_random_seed(1)
    
     a_C = tf.random_normal([1, 4, 4, 3], mean=1, stddev=4)
    
     a_G = tf.random_normal([1, 4, 4, 3], mean=1, stddev=4)
    
     J_content = compute_content_cost(a_C, a_G)
    
     print("J_content = " + str(J_content.eval()))

Expected Output :

J_content 6.76559

What you should remember :

The content cost takes a hidden layer activation of the neural network, and measures how different and are.When we minimize the content cost later, this will help make sure has similar content as .

3.2.1 - Style matrix

The style matrix is also called a "Gram matrix." In linear algebra, the Gram matrix G of a set of vectors is the matrix of dot products, whose entries are . In other words, compares how similar is to : If they are highly similar, you would expect them to have a large dot product, and thus for to be large.

Note that there is an unfortunate collision in the variable names used here. We are following common terminology used in the literature, but is used to denote the Style matrix (or Gram matrix) as well as to denote the generated image . We will try to make sure which we are referring to is always clear from the context.

In NST, you can compute the Style matrix by multiplying the "unrolled" filter matrix with their transpose:

The result is a matrix of dimension where is the number of filters. The value measures how similar the activations of filter are to the activations of filter .

One important part of the gram matrix is that the diagonal elements such as also measures how active filter is. For example, suppose filter is detecting vertical textures in the image. Then measures how common vertical textures are in the image as a whole: If is large, this means that the image has a lot of vertical texture.

By capturing the prevalence of different types of features (), as well as how much different features occur together (), the Style matrix measures the style of an image.

Exercise : Using TensorFlow, implement a function that computes the Gram matrix of a matrix A. The formula is: The gram matrix of A is . If you are stuck, take a look at Hint 1 and Hint 2.

复制代码
 # GRADED FUNCTION: gram_matrix

    
  
    
 def gram_matrix(A):
    
     """
    
     Argument:
    
     A -- matrix of shape (n_C, n_H*n_W)
    
     
    
     Returns:
    
     GA -- Gram matrix of A, of shape (n_C, n_C)
    
     """
    
     
    
     ### START CODE HERE ### (≈1 line)
    
     GA = tf.matmul(A,tf.transpose(A))
    
     ### END CODE HERE ###
    
     
    
     return GA
复制代码
 tf.reset_default_graph()

    
  
    
 with tf.Session() as test:
    
     tf.set_random_seed(1)
    
     A = tf.random_normal([3, 2*1], mean=1, stddev=4)
    
     GA = gram_matrix(A)
    
     
    
     print("GA = " + str(GA.eval()))

3.2.2 - Style cost

After generating the Style matrix (Gram matrix), your goal will be to minimize the distance between the Gram matrix of the "style" image S and that of the "generated" image G. For now, we are using only a single hidden layer , and the corresponding style cost for this layer is defined as:

where and are respectively the Gram matrices of the "style" image and the "generated" image, computed using the hidden layer activations for a particular hidden layer in the network.

Exercise : Compute the style cost for a single layer.

Instructions : The 3 steps to implement this function are:

  1. Retrieve dimensions from the hidden layer activations a_G:
    • To retrieve dimensions from a tensor X, use: X.get_shape().as_list()
  2. Unroll the hidden layer activations a_S and a_G into 2D matrices, as explained in the picture above.
  3. Compute the Style matrix of the images S and G. (Use the function you had previously written.)
  4. Compute the Style cost:
复制代码
 # GRADED FUNCTION: compute_layer_style_cost

    
  
    
 def compute_layer_style_cost(a_S, a_G):
    
     """
    
     Arguments:
    
     a_S -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image S 
    
     a_G -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image G
    
     
    
     Returns: 
    
     J_style_layer -- tensor representing a scalar value, style cost defined above by equation (2)
    
     """
    
     
    
     ### START CODE HERE ###
    
     # Retrieve dimensions from a_G (≈1 line)
    
     m, n_H, n_W, n_C = a_G.get_shape().as_list()
    
     
    
     # Reshape the images to have them of shape (n_C, n_H*n_W) (≈2 lines)
    
     a_S = tf.transpose(tf.reshape(a_S,shape=[n_H*n_W,n_C]))
    
     a_G = tf.transpose(tf.reshape(a_G,shape=[n_H*n_W,n_C]))
    
  
    
     # Computing gram_matrices for both images S and G (≈2 lines)
    
     GS = gram_matrix(a_S)
    
     GG = gram_matrix(a_G)
    
  
    
     # Computing the loss (≈1 line)
    
     J_style_layer = tf.reduce_sum(tf.square(tf.subtract(GS,GG)))/(4*(n_H*n_W)**2*(n_C)**2)
    
     
    
     ### END CODE HERE ###
    
     
    
     return J_style_layer
复制代码
 tf.reset_default_graph()

    
  
    
 with tf.Session() as test:
    
     tf.set_random_seed(1)
    
     a_S = tf.random_normal([1, 4, 4, 3], mean=1, stddev=4)
    
     a_G = tf.random_normal([1, 4, 4, 3], mean=1, stddev=4)
    
     J_style_layer = compute_layer_style_cost(a_S, a_G)
    
     
    
     print("J_style_layer = " + str(J_style_layer.eval()))

Expected Output :

J_style_layer 9.19028

3.2.3 Style Weights

So far you have captured the style from only one layer. We'll get better results if we "merge" style costs from several different layers. After completing this exercise, feel free to come back and experiment with different weights to see how it changes the generated image . But for now, this is a pretty reasonable default:

In [ ]:

You can combine the style costs for different layers as follows:

where the values for are given in STYLE_LAYERS.

We've implemented a compute_style_cost(...) function. It simply calls your compute_layer_style_cost(...) several times, and weights their results using the values in STYLE_LAYERS. Read over it to make sure you understand what it's doing.

复制代码
 def compute_style_cost(model, STYLE_LAYERS):

    
     """
    
     Computes the overall style cost from several chosen layers
    
     
    
     Arguments:
    
     model -- our tensorflow model
    
     STYLE_LAYERS -- A python list containing:
    
                     - the names of the layers we would like to extract style from
    
                     - a coefficient for each of them
    
     
    
     Returns: 
    
     J_style -- tensor representing a scalar value, style cost defined above by equation (2)
    
     """
    
     
    
     # initialize the overall style cost
    
     J_style = 0
    
  
    
     for layer_name, coeff in STYLE_LAYERS:
    
  
    
     # Select the output tensor of the currently selected layer
    
     out = model[layer_name]
    
  
    
     # Set a_S to be the hidden layer activation from the layer we have selected, by running the session on out
    
     a_S = sess.run(out)
    
  
    
     # Set a_G to be the hidden layer activation from same layer. Here, a_G references model[layer_name] 
    
     # and isn't evaluated yet. Later in the code, we'll assign the image G as the model input, so that
    
     # when we run the session, this will be the activations drawn from the appropriate layer, with G as input.
    
     a_G = out
    
     
    
     # Compute style_cost for the current layer
    
     J_style_layer = compute_layer_style_cost(a_S, a_G)
    
  
    
     # Add coeff * J_style_layer of this layer to overall style cost
    
     J_style += coeff * J_style_layer
    
  
    
     return J_style

Note : In the inner-loop of the for-loop above, a_G is a tensor and hasn't been evaluated yet. It will be evaluated and updated at each iteration when we run the TensorFlow graph in model_nn() below.

What you should remember :

The style of an image can be represented using the Gram matrix of a hidden layer's activations. However, we get even better results combining this representation from multiple different layers. This is in contrast to the content representation, where usually using just a single hidden layer is sufficient.Minimizing the style cost will cause the image to follow the style of the image .

3.3 - Defining the total cost to optimize

Finally, let's create a cost function that minimizes both the style and the content cost. The formula is:

Exercise : Implement the total cost function which includes both the content cost and the style cost.

复制代码
 # GRADED FUNCTION: total_cost

    
  
    
 def total_cost(J_content, J_style, alpha = 10, beta = 40):
    
     """
    
     Computes the total cost function
    
     
    
     Arguments:
    
     J_content -- content cost coded above
    
     J_style -- style cost coded above
    
     alpha -- hyperparameter weighting the importance of the content cost
    
     beta -- hyperparameter weighting the importance of the style cost
    
     
    
     Returns:
    
     J -- total cost as defined by the formula above.
    
     """
    
     
    
     ### START CODE HERE ### (≈1 line)
    
     J = alpha*J_content+beta*J_style
    
     ### END CODE HERE ###
    
     
    
     return J
复制代码
 tf.reset_default_graph()

    
  
    
 with tf.Session() as test:
    
     np.random.seed(3)
    
     J_content = np.random.randn()    
    
     J_style = np.random.randn()
    
     J = total_cost(J_content, J_style)
    
     print("J = " + str(J))

4 - Solving the optimization problem

Finally, let's put everything together to implement Neural Style Transfer!

Here's what the program will have to do:

  1. Create an Interactive Session
  2. Load the content image
  3. Load the style image
  4. Randomly initialize the image to be generated
  5. Load the VGG16 model
  6. Build the TensorFlow graph:
    • Run the content image through the VGG16 model and compute the content cost
    • Run the style image through the VGG16 model and compute the style cost
    • Compute the total cost
    • Define the optimizer and the learning rate
  7. Initialize the TensorFlow graph and run it for a large number of iterations, updating the generated image at every step.

Lets go through the individual steps in detail.

You've previously implemented the overall cost . We'll now set up TensorFlow to optimize this with respect to . To do so, your program has to reset the graph and use an "Interactive Session". Unlike a regular session, the "Interactive Session" installs itself as the default session to build a graph. This allows you to run variables without constantly needing to refer to the session object, which simplifies the code.

Lets start the interactive session.

复制代码
 # Reset the graph

    
 tf.reset_default_graph()
    
  
    
 # Start interactive session
    
 sess = tf.InteractiveSession()

Let's load, reshape, and normalize our "content" image (the Louvre museum picture):

复制代码
 content_image = scipy.misc.imread("images/louvre_small.jpg")

    
 content_image = reshape_and_normalize_image(content_image)

Now, we initialize the "generated" image as a noisy image created from the content_image. By initializing the pixels of the generated image to be mostly noise but still slightly correlated with the content image, this will help the content of the "generated" image more rapidly match the content of the "content" image. (Feel free to look in nst_utils.py to see the details of generate_noise_image(...); to do so, click "File-->Open..." at the upper-left corner of this Jupyter notebook.)

复制代码
 generated_image = generate_noise_image(content_image)

    
 imshow(generated_image[0])
复制代码
    model = load_vgg_model("pretrained-model/imagenet-vgg-verydeep-19.mat") # load VGG model

To get the program to compute the content cost, we will now assign a_C and a_G to be the appropriate hidden layer activations. We will use layer conv4_2 to compute the content cost. The code below does the following:

  1. Assign the content image to be the input to the VGG model.
  2. Set a_C to be the tensor giving the hidden layer activation for layer "conv4_2".
  3. Set a_G to be the tensor giving the hidden layer activation for the same layer.
  4. Compute the content cost using a_C and a_G.
复制代码
 # Assign the content image to be the input of the VGG model.  
    
 sess.run(model['input'].assign(content_image))
    
  
    
 # Select the output tensor of layer conv4_2
    
 out = model['conv4_2']
    
  
    
 # Set a_C to be the hidden layer activation from the layer we have selected
    
 a_C = sess.run(out)
    
  
    
 # Set a_G to be the hidden layer activation from same layer. Here, a_G references model['conv4_2'] 
    
 # and isn't evaluated yet. Later in the code, we'll assign the image G as the model input, so that
    
 # when we run the session, this will be the activations drawn from the appropriate layer, with G as input.
    
 a_G = out
    
  
    
 # Compute the content cost
    
 J_content = compute_content_cost(a_C, a_G)

Note: At this point, a_G is a tensor and hasn't been evaluated. It will be evaluated and updated at each iteration when we run the Tensorflow graph in model_nn() below.

复制代码
 # Assign the input of the model to be the "style" image

    
 sess.run(model['input'].assign(style_image))
    
  
    
 # Compute the style cost
    
 J_style = compute_style_cost(model, STYLE_LAYERS)

Exercise : Now that you have J_content and J_style, compute the total cost J by calling total_cost() . Use alpha = 10 and beta = 40 .

复制代码
 ### START CODE HERE ### (1 line)

    
 J = total_cost(J_content,J_style,alpha=10,beta=40)
    
 ### END CODE HERE ###

You'd previously learned how to set up the Adam optimizer in TensorFlow. Lets do that here, using a learning rate of 2.0. See reference

复制代码
 # define optimizer (1 line)

    
 optimizer = tf.train.AdamOptimizer(2.0)
    
  
    
 # define train_step (1 line)
    
 train_step = optimizer.minimize(J)

Exercise: Implement the model_nn() function which initializes the variables of the tensorflow graph, assigns the input image (initial generated image) as the input of the VGG16 model and runs the train_step for a large number of steps.

全部评论 (0)

还没有任何评论哟~