Advertisement

预训练的图像分类模型_使用预训练模型进行图像分类

阅读量:
预训练的图像分类模型

预训练的图像分类模型

使用预训练模型进行图像分类 (Image Classification Using Pre-Trained Model)

In this lesson, you will learn to use a pre-trained model to detect objects in a given image. You will use squeezenet pre-trained module that detects and classifies the objects in a given image with a great accuracy.

在本课程中,您将学习使用预先训练的模型来检测给定图像中的对象。 您将使用squeezenet 预训练模块,该模块可以非常精确地检测和分类给定图像中的对象。

Open a new Juypter notebook and follow the steps to develop this image classification application.

打开一个新的Juypter笔记本, 并按照以下步骤开发此图像分类应用程序。

导入库 (Importing Libraries)

First, we import the required packages using the below code −

首先,我们使用以下代码导入所需的软件包-

复制代码
  
    
 from caffe2.proto import caffe2_pb2
    
 from caffe2.python import core, workspace, models
    
 import numpy as np
    
 import skimage.io
    
 import skimage.transform
    
 from matplotlib import pyplot
    
 import os
    
 import urllib.request as urllib2
    
 import operator

Next, we set up a few variables

接下来,我们设置一些变量 -

复制代码
  
    
 INPUT_IMAGE_SIZE = 227
    
 mean = 128

The images used for training will obviously be of varied sizes. All these images must be converted into a fixed size for accurate training. Likewise, the test images and the image which you want to predict in the production environment must also be converted to the size, the same as the one used during training. Thus, we create a variable above called INPUT_IMAGE_SIZE having value 227. Hence, we will convert all our images to the size 227x227 before using it in our classifier.

用于训练的图像显然将具有各种尺寸。 所有这些图像都必须转换为固定大小才能进行精确训练。 同样,测试图像和要在生产环境中预测的图像也必须转换为尺寸,与训练期间使用的尺寸相同。 因此,我们在上面创建了一个名为INPUT_IMAGE_SIZE 的变量,其值为227 。 因此,我们将在分类器中使用图像之前将所有图像转换为227x227 尺寸。

We also declare a variable called mean having value 128 , which is used later for improving the classification results.

我们还声明了一个名为mean 的变量,其值为128 ,以后将用于改进分类结果。

Next, we will develop two functions for processing the image.

接下来,我们将开发两个功能来处理图像。

图像处理 (Image Processing)

The image processing consists of two steps. First one is to resize the image, and the second one is to centrally crop the image. For these two steps, we will write two functions for resizing and cropping.

图像处理包括两个步骤。 第一个是调整图像大小,第二个是集中裁剪图像。 对于这两个步骤,我们将编写两个用于调整大小和裁剪的函数。

图像调整大小 (Image Resizing)

First, we will write a function for resizing the image. As said earlier, we will resize the image to 227x227. So let us define the function resize as follows −

首先,我们将编写一个用于调整图像大小的函数。 如前所述,我们将图像调整为227x227 。 因此,让我们定义函数调整大小 ,如下所示:

复制代码
  
    
 def resize(img, input_height, input_width):

We obtain the aspect ratio of the image by dividing the width by the height.

我们通过将宽度除以高度来获得图像的长宽比。

复制代码
  
    
 original_aspect = img.shape[1]/float(img.shape[0])

If the aspect ratio is greater than 1, it indicates that the image is wide, that to say it is in the landscape mode. We now adjust the image height and return the resized image using the following code −

如果宽高比大于1,则表示图像很宽,也就是说处于风景模式。 现在,我们使用以下代码调整图像高度并返回调整后的图像大小-

复制代码
  
    
 if(original_aspect>1):
    
    new_height = int(original_aspect * input_height)
    
    return skimage.transform.resize(img, (input_width,
    
    new_height), mode='constant', anti_aliasing=True, anti_aliasing_sigma=None)

If the aspect ratio is less than 1 , it indicates the portrait mode. We now adjust the width using the following code −

如果宽高比小于1 ,则表示纵向模式 。 现在我们使用以下代码调整宽度-

复制代码
  
    
 if(original_aspect<1):
    
    new_width = int(input_width/original_aspect)
    
    return skimage.transform.resize(img, (new_width,
    
    input_height), mode='constant', anti_aliasing=True, anti_aliasing_sigma=None)

If the aspect ratio equals 1 , we do not make any height/width adjustments.

如果宽高比等于1 ,则我们不进行任何高度/宽度调整。

复制代码
  
    
 if(original_aspect == 1):
    
    return skimage.transform.resize(img, (input_width,
    
    input_height), mode='constant', anti_aliasing=True, anti_aliasing_sigma=None)

The full function code is given below for your quick reference −

下面给出了完整的功能代码,供您快速参考-

复制代码
  
    
 def resize(img, input_height, input_width):
    
    original_aspect = img.shape[1]/float(img.shape[0])
    
    if(original_aspect>1):
    
       new_height = int(original_aspect * input_height)
    
       return skimage.transform.resize(img, (input_width,
    
 	   new_height), mode='constant', anti_aliasing=True, anti_aliasing_sigma=None)
    
    if(original_aspect<1):
    
      new_width = int(input_width/original_aspect)
    
      return skimage.transform.resize(img, (new_width,
    
      input_height), mode='constant', anti_aliasing=True, anti_aliasing_sigma=None)
    
    if(original_aspect == 1):
    
      return skimage.transform.resize(img, (input_width,
    
      input_height), mode='constant', anti_aliasing=True, anti_aliasing_sigma=None)

We will now write a function for cropping the image around its center.

现在,我们将编写一个用于在图像中心附近裁剪图像的函数。

图像裁剪 (Image Cropping)

We declare the crop_image function as follows −

我们声明如下的crop_image 函数:

复制代码
  
    
 def crop_image(img,cropx,cropy):

We extract the dimensions of the image using the following statement −

我们使用以下语句提取图像的尺寸-

复制代码
  
    
 y,x,c = img.shape

We create a new starting point for the image using the following two lines of code −

我们使用以下两行代码为图像创建新的起点-

复制代码
  
    
 startx = x//2-(cropx//2)
    
 starty = y//2-(cropy//2)

Finally, we return the cropped image by creating an image object with the new dimensions −

最后,我们通过创建具有新尺寸的图像对象来返回裁剪后的图像-

复制代码
  
    
 return img[starty:starty+cropy,startx:startx+cropx]

The entire function code is given below for your quick reference −

下面给出了完整的功能代码,供您快速参考-

复制代码
  
    
 def crop_image(img,cropx,cropy):
    
    y,x,c = img.shape
    
    startx = x//2-(cropx//2)
    
    starty = y//2-(cropy//2)
    
    return img[starty:starty+cropy,startx:startx+cropx]

Now, we will write code to test these functions.

现在,我们将编写代码来测试这些功能。

处理图像 (Processing Image)

Firstly, copy an image file into images subfolder within your project directory. tree.jpg file is copied in the project. The following Python code loads the image and displays it on the console −

首先,将图像文件复制到项目目录中的images 子文件夹中。 tree.jpg 文件复制到项目中。 以下Python代码加载图像并将其显示在控制台上-

复制代码
  
    
 img = skimage.img_as_float(skimage.io.imread("images/tree.jpg")).astype(np.float32)
    
 print("Original Image Shape: " , img.shape)
    
 pyplot.figure()
    
 pyplot.imshow(img)
    
 pyplot.title('Original image')

The output is as follows −

输出如下-
Processing Image

Note that size of the original image is 600 x 960. We need to resize this to our specification of 227 x 227. Calling our earlier-defined resize function does this job.

请注意,原始图像的大小为600 x 960 。 我们需要将此尺寸调整为227 x 227的 规格。 调用我们之前定义的调整大小 函数即可完成此工作。

复制代码
  
    
 img = resize(img, INPUT_IMAGE_SIZE, INPUT_IMAGE_SIZE)
    
 print("Image Shape after resizing: " , img.shape)
    
 pyplot.figure()
    
 pyplot.imshow(img)
    
 pyplot.title('Resized image')

The output is as given below −

输出如下-
Original Image

Note that now the image size is 227 x 363. We need to crop this to 227 x 227 for the final feed to our algorithm. We call the previously-defined crop function for this purpose.

请注意,现在图像大小为227 x 363 。 我们需要将其裁剪为227 x 227 ,以最终填充到我们的算法中。 为此,我们调用先前定义的裁剪函数。

复制代码
  
    
 img = crop_image(img, INPUT_IMAGE_SIZE, INPUT_IMAGE_SIZE)
    
 print("Image Shape after cropping: " , img.shape)
    
 pyplot.figure()
    
 pyplot.imshow(img)
    
 pyplot.title('Center Cropped')

Below mentioned is the output of the code −

下面提到的是代码的输出-
Cropping Image

At this point, the image is of size 227 x 227 and is ready for further processing. We now swap the image axes to extract the three colours into three different zones.

此时,图像的尺寸为227 x 227 ,可以进行进一步处理了。 现在,我们交换图像轴以将三种颜色提取到三个不同的区域中。

复制代码
  
    
 img = img.swapaxes(1, 2).swapaxes(0, 1)
    
 print("CHW Image Shape: " , img.shape)

Given below is the output −

下面给出的是输出-

复制代码
  
    
 CHW Image Shape: (3, 227, 227)

Note that the last axis has now become the first dimension in the array. We will now plot the three channels using the following code −

请注意,最后一个轴现在已成为数组中的第一个维度。 现在,我们将使用以下代码绘制三个通道:

复制代码
  
    
 pyplot.figure()
    
 for i in range(3):
    
    pyplot.subplot(1, 3, i+1)
    
    pyplot.imshow(img[i])
    
    pyplot.axis('off')
    
    pyplot.title('RGB channel %d' % (i+1))

The output is stated below −

输出说明如下-
Dimension

Finally, we do some additional processing on the image such as converting Red Green Blue to Blue Green Red (RGB to BGR) , removing mean for better results and adding batch size axis using the following three lines of code −

最后,我们对图像进行一些其他处理,例如将红绿色蓝 转换为蓝绿色红(RGB到BGR) ,去除均值以获得更好的结果以及使用以下三行代码添加批处理大小轴-

复制代码
  
    
 # convert RGB --> BGR
    
 img = img[(2, 1, 0), :, :]
    
 # remove mean
    
 img = img * 255 - mean
    
 # add batch size axis
    
 img = img[np.newaxis, :, :, :].astype(np.float32)

At this point, your image is in NCHW format and is ready for feeding into our network. Next, we will load our pre-trained model files and feed the above image into it for prediction.

此时,您的图片已采用NCHW格式 ,可以输入我们的网络了。 接下来,我们将加载预先训练的模型文件,并将上面的图像输入到其中以进行预测。

预测处理图像中的对象 (Predicting Objects in Processed Image)

We first setup the paths for the init and predict networks defined in the pre-trained models of Caffe.

我们首先为init 设置路径,并预测 在Caffe的预训练模型中定义的网络。

设置模型文件路径 (Setting Model File Paths)

Remember from our earlier discussion, all the pre-trained models are installed in the models folder. We set up the path to this folder as follows −

请记住,从我们之前的讨论中,所有经过预训练的模型都安装在models 文件夹中。 我们按照以下步骤设置此文件夹的路径-

复制代码
  
    
 CAFFE_MODELS = os.path.expanduser("/anaconda3/lib/python3.7/site-packages/caffe2/python/models")

We set up the path to the init_net protobuf file of the squeezenet model as follows −

我们设置squeezenet 模型的init_net protobuf文件的路径,如下所示-

复制代码
  
    
 INIT_NET = os.path.join(CAFFE_MODELS, 'squeezenet', 'init_net.pb')

Likewise, we set up the path to the predict_net protobuf as follows −

同样地,我们按照以下步骤设置了前往predict_net protobuf的路径:

复制代码
  
    
 PREDICT_NET = os.path.join(CAFFE_MODELS, 'squeezenet', 'predict_net.pb')

We print the two paths for diagnosis purpose −

我们出于诊断目的打印两条路径-

复制代码
  
    
 print(INIT_NET)
    
 print(PREDICT_NET)

The above code along with the output is given here for your quick reference −

上面的代码以及输出在这里给出,供您快速参考-

复制代码
  
    
 CAFFE_MODELS = os.path.expanduser("/anaconda3/lib/python3.7/site-packages/caffe2/python/models")
    
 INIT_NET = os.path.join(CAFFE_MODELS, 'squeezenet', 'init_net.pb')
    
 PREDICT_NET = os.path.join(CAFFE_MODELS, 'squeezenet', 'predict_net.pb')
    
 print(INIT_NET)
    
 print(PREDICT_NET)

The output is mentioned below −

输出在下面提到-

复制代码
  
    
 /anaconda3/lib/python3.7/site-packages/caffe2/python/models/squeezenet/init_net.pb
    
 /anaconda3/lib/python3.7/site-packages/caffe2/python/models/squeezenet/predict_net.pb

Next, we will create a predictor.

接下来,我们将创建一个预测变量。

创建预测变量 (Creating Predictor)

We read the model files using the following two statements −

我们使用以下两个语句读取模型文件-

复制代码
  
    
 with open(INIT_NET, "rb") as f:
    
    init_net = f.read()
    
 with open(PREDICT_NET, "rb") as f:
    
    predict_net = f.read()

The predictor is created by passing pointers to the two files as parameters to the Predictor function.

通过将指向两个文件的指针作为Predictor 函数的参数来创建预测器

复制代码
  
    
 p = workspace.Predictor(init_net, predict_net)

The p object is the predictor, which is used for predicting the objects in any given image. Note that each input image must be in NCHW format as what we have done earlier to our tree.jpg file.

p 对象是预测变量,用于预测任何给定图像中的对象。 请注意,每个输入图像必须采用NCHW格式,就像我们之前对tree.jpg 文件所做的一样。

预测对象 (Predicting Objects)

To predict the objects in a given image is trivial - just executing a single line of command. We call run method on the predictor object for an object detection in a given image.

预测给定图像中的对象很简单-仅执行一行命令即可。 我们在预测 对象上调用run 方法,以检测给定图像中的对象。

复制代码
  
    
 results = p.run({'data': img})

The prediction results are now available in the results object, which we convert to an array for our readability.

预测结果现在在结果 对象中可用,为了便于阅读,我们将其转换为数组。

复制代码
  
    
 results = np.asarray(results)

Print the dimensions of the array for your understanding using the following statement −

使用以下语句打印数组的尺寸以供您理解-

复制代码
  
    
 print("results shape: ", results.shape)

The output is as shown below −

输出如下所示-

复制代码
  
    
 results shape: (1, 1, 1000, 1, 1)

We will now remove the unnecessary axis −

现在,我们将删除不必要的轴-

复制代码
  
    
 preds = np.squeeze(results)

The topmost predication can now be retrieved by taking the max value in the preds array.

现在可以通过获取preds 数组中的最大值 来检索最上面的谓词。

复制代码
  
    
 curr_pred, curr_conf = max(enumerate(preds), key=operator.itemgetter(1))
    
 print("Prediction: ", curr_pred)
    
 print("Confidence: ", curr_conf)

The output is as follows −

输出如下-

复制代码
  
    
 Prediction: 984
    
 Confidence: 0.89235985

As you see the model has predicted an object with an index value 984 with 89% confidence. The index of 984 does not make much sense to us in understanding what kind of object is detected. We need to get the stringified name for the object using its index value. The kind of objects that the model recognizes along with their corresponding index values are available on a github repository.

如您所见,该模型预测的索引值为984 的对象的置信度为89% 。 984的索引对我们了解检测到哪种对象没有多大意义。 我们需要使用其索引值获取对象的字符串化名称。 模型可以识别的对象类型及其对应的索引值在github存储库中可用。

Now, we will see how to retrieve the name for our object having index value of 984.

现在,我们将看到如何检索索引值为984的对象的名称。

字符串化结果 (Stringifying Result)

We create a URL object to the github repository as follows −

我们为github存储库创建一个URL对象,如下所示:

复制代码
  
    
 codes = "https://gist.githubusercontent.com/aaronmarkham/cd3a6b6ac0
    
 71eca6f7b4a6e40e6038aa/raw/9edb4038a37da6b5a44c3b5bc52e448ff09bfe5b/alexnet_codes"

We read the contents of the URL −

我们阅读了URL的内容-

复制代码
  
    
 response = urllib2.urlopen(codes)

The response will contain a list of all codes and its descriptions. Few lines of the response are shown below for your understanding of what it contains −

响应将包含所有代码及其说明的列表。 下面显示了几行响应,以帮助您了解响应包含的内容-

复制代码
  
    
 5: 'electric ray, crampfish, numbfish, torpedo',
    
 6: 'stingray',
    
 7: 'cock',
    
 8: 'hen',
    
 9: 'ostrich, Struthio camelus',
    
 10: 'brambling, Fringilla montifringilla',

We now iterate the entire array to locate our desired code of 984 using a for loop as follows −

现在,我们使用for 循环遍历整个数组以找到所需的代码984-

复制代码
  
    
 for line in response:
    
    mystring = line.decode('ascii')
    
    code, result = mystring.partition(":")[::2]
    
    code = code.strip()
    
    result = result.replace("'", "")
    
    if (code == str(curr_pred)):
    
       name = result.split(",")[0][1:]
    
       print("Model predicts", name, "with", curr_conf, "confidence")

When you run the code, you will see the following output −

运行代码时,您将看到以下输出-

复制代码
  
    
 Model predicts rapeseed with 0.89235985 confidence

You may now try the model on another image.

您现在可以在其他图像上尝试模型。

预测不同的图像 (Predicting a Different Image)

To predict another image, simply copy the image file into the images folder of your project directory. This is the directory in which our earlier tree.jpg file is stored. Change the name of the image file in the code. Only one change is required as shown below

要预测其他图像,只需将图像文件复制到项目目录的images 文件夹中即可。 这是我们先前的tree.jpg 文件存储的目录。 在代码中更改图像文件的名称。 只需更改一下,如下所示

复制代码
  
    
 img = skimage.img_as_float(skimage.io.imread("images/pretzel.jpg")).astype(np.float32)

The original picture and the prediction result are shown below −

原始图片和预测结果如下所示-
Predicting Image

The output is mentioned below −

输出在下面提到-

复制代码
  
    
 Model predicts pretzel with 0.99999976 confidence

As you see the pre-trained model is able to detect objects in a given image with a great amount of accuracy.

如您所见,经过预训练的模型能够在给定图像中以很高的精度检测物体。

完整资料 (Full Source)

The full source for the above code that uses a pre-trained model for object detection in a given image is mentioned here for your quick reference −

此处提到了上述代码的完整源,该代码使用预训练模型在给定图像中进行对象检测,以供您快速参考-

复制代码
  
    
 def crop_image(img,cropx,cropy):
    
    y,x,c = img.shape
    
    startx = x//2-(cropx//2)
    
    starty = y//2-(cropy//2)
    
    return img[starty:starty+cropy,startx:startx+cropx]
    
 img = skimage.img_as_float(skimage.io.imread("images/pretzel.jpg")).astype(np.float32)
    
 print("Original Image Shape: " , img.shape)
    
 pyplot.figure()
    
 pyplot.imshow(img)
    
 pyplot.title('Original image')
    
 img = resize(img, INPUT_IMAGE_SIZE, INPUT_IMAGE_SIZE)
    
 print("Image Shape after resizing: " , img.shape)
    
 pyplot.figure()
    
 pyplot.imshow(img)
    
 pyplot.title('Resized image')
    
 img = crop_image(img, INPUT_IMAGE_SIZE, INPUT_IMAGE_SIZE)
    
 print("Image Shape after cropping: " , img.shape)
    
 pyplot.figure()
    
 pyplot.imshow(img)
    
 pyplot.title('Center Cropped')
    
 img = img.swapaxes(1, 2).swapaxes(0, 1)
    
 print("CHW Image Shape: " , img.shape)
    
 pyplot.figure()
    
 for i in range(3):
    
 pyplot.subplot(1, 3, i+1)
    
 pyplot.imshow(img[i])
    
 pyplot.axis('off')
    
 pyplot.title('RGB channel %d' % (i+1))
    
 # convert RGB --> BGR
    
 img = img[(2, 1, 0), :, :]
    
 # remove mean
    
 img = img * 255 - mean
    
 # add batch size axis
    
 img = img[np.newaxis, :, :, :].astype(np.float32)
    
 CAFFE_MODELS = os.path.expanduser("/anaconda3/lib/python3.7/site-packages/caffe2/python/models")
    
 INIT_NET = os.path.join(CAFFE_MODELS, 'squeezenet', 'init_net.pb')
    
 PREDICT_NET = os.path.join(CAFFE_MODELS, 'squeezenet', 'predict_net.pb')
    
 print(INIT_NET)
    
 print(PREDICT_NET)
    
 with open(INIT_NET, "rb") as f:
    
    init_net = f.read()
    
 with open(PREDICT_NET, "rb") as f:
    
    predict_net = f.read()
    
 p = workspace.Predictor(init_net, predict_net)
    
 results = p.run({'data': img})
    
 results = np.asarray(results)
    
 print("results shape: ", results.shape)
    
 preds = np.squeeze(results)
    
 curr_pred, curr_conf = max(enumerate(preds), key=operator.itemgetter(1))
    
 print("Prediction: ", curr_pred)
    
 print("Confidence: ", curr_conf)
    
 codes = "https://gist.githubusercontent.com/aaronmarkham/cd3a6b6ac071eca6f7b4a6e40e6038aa/raw/9edb4038a37da6b5a44c3b5bc52e448ff09bfe5b/alexnet_codes"
    
 response = urllib2.urlopen(codes)
    
 for line in response:
    
    mystring = line.decode('ascii')
    
    code, result = mystring.partition(":")[::2]
    
    code = code.strip()
    
    result = result.replace("'", "")
    
    if (code == str(curr_pred)):
    
       name = result.split(",")[0][1:]
    
       print("Model predicts", name, "with", curr_conf, "confidence")

By this time, you know how to use a pre-trained model for doing the predictions on your dataset.

到此时,您知道如何使用预先训练的模型对数据集进行预测了。

What’s next is to learn how to define your neural network (NN) architectures in Caffe2 and train them on your dataset. We will now learn how to create a trivial single layer NN.

接下来的工作是学习如何在Caffe2中 定义神经网络(NN) 架构并将其训练在数据集上。 现在,我们将学习如何创建简单的单层NN。

翻译自: https://www.tutorialspoint.com/caffe2/caffe2_image_classification_using_pre_trained_model.htm

预训练的图像分类模型

全部评论 (0)

还没有任何评论哟~