Advertisement

多标签分类 评价指标_多标签分类指标

阅读量:

多标签分类 评价指标

度量在机器学习或深度学习领域扮演着至关重要的角色。我们从度量的选择入手,以便了解特定模型的基准分数。本文将深入探讨多标签分类中最佳和最常用的度量方法,并分析其与常规度量的不同之处。

在机器学习与深度学习领域中,指标扮演着至关重要的角色。我们从度量的选择入手,旨在探究特定模型的基础性能水平。在本博客中,我们深入探讨了多标签分类任务中的最佳与常用指标,并对这些指标相较于常规指标有哪些独特之处进行了详细分析。

I should delve into the concept of Multi-Label Classification, just in case it proves useful for you. If one has data on a dog's features, they are aimed at predicting both its breed and pet category.

希望了解什么是多标签分类?以防万一您需要用到它。如果狗的特征数据存在,则必须预测它属于哪些品种以及是否是宠物。

In the context of Object-Detection, Multi-Label Categorization yields a list of all objects present in an image. As demonstrated, the classifier identifies three objects within the image. This can be represented as a binary vector [1 0 1 1], assuming four distinct trained categories: dog, human, bicycle, and truck.

在物体检测的情境下(或情况),多标签分类算法能够为我们提供一张图中所有物体的列表(或呈现)。通过该分类器我们可以识别出图像中的三个物体(或三个类别)。当训练数据集包含4个类别时,则该列表包括:狗、人、自行车和卡车(或具体类别名称)。

Object Detection

Object Detection output 目标检测输出

This kind of classification is known as Multi-Label Classification.

这种分类称为多标签分类。

The primary and frequently employed metrics in Multi-Label Classification demonstrate their effectiveness in evaluating various aspects of the model. They are typically utilized to assess:

用于多标签分类的最常见指标如下:

  1. Precision at k

k精度

  1. Avg precision at k

平均精度(k)

  1. Mean avg precision at k

k的平均平均精度

  1. Sampled F1 Score

采样的F1分数

Let’s get into the details of these metrics.

让我们详细了解这些指标。

k精度(P @ k): (Precision at k (P@k):)

Given a list of actual categories and predicted categories, precision at position k is calculated as the count of accurate predictions among the top k items in each category divided by k. The resulting values will always fall within a range from 0 to 1.

基于真实类别与预测类别的列表中,在位置k处的精确度计算方法是:对于每个类别,在其前k个元素中被正确识别的数量占总数量的比例。 其取值范围为0至1。

Here is an example as explaining the same in code:

这是一个解释相同代码的示例:

复制代码
 def patk(actual, pred, k):

    
 	#we return 0 if k is 0 because 
    
 	#   we can't divide the no of common values by 0 
    
 	if k == 0:
    
 		return 0
    
  
    
  
    
 	#taking only the top k predictions in a class 
    
 	k_pred = pred[:k]
    
  
    
  
    
 	#taking the set of the actual values 
    
 	actual_set = set(actual)
    
  
    
  
    
 	#taking the set of the predicted values 
    
 	pred_set = set(k_pred)
    
  
    
  
    
 	#taking the intersection of the actual set and the pred set
    
 		# to find the common values
    
 	common_values = actual_set.intersection(pred_set)
    
  
    
  
    
 	return len(common_values)/len(pred[:k])
    
  
    
  
    
 #defining the values of the actual and the predicted class
    
 y_true = [1 ,2, 0]
    
 y_pred = [1, 1, 0]
    
  
    
  
    
 if __name__ == "__main__":
    
     print(patk(y_true, y_pred,3))
    
    
    
    
    代码解释

Running the following code, we get the following result.

运行以下代码,我们得到以下结果。

复制代码
6666666666666666
    
    代码解释

In this case, we got the value of 2 as 1, thus resulting in the score going down.

在这种情况下,我们将2的值设为1,从而导致得分下降。

K处的平均精度(AP @ k): (Average Precision at K (AP@k):)

Known as the average of precision values at position k, this metric is calculated by taking the mean across all positions from 1 to N. To better understand how this works, let's examine a piece of code that implements it. Such metrics typically fall within a range from 0 to 1.

它被称为k从1到k时各点精度值的平均数。 为了更好地理解这一概念,请观察一些示例代码。 其取值区间介于0和1之间。

复制代码
 import numpy as np

    
 import pk
    
  
    
  
    
 def apatk(acutal, pred, k):
    
 	#creating a list for storing the values of precision for each k 
    
 	precision_ = []
    
 	for i in range(1, k+1):
    
 		#calculating the precision at different values of k 
    
 		#      and appending them to the list 
    
 		precision_.append(pk.patk(acutal, pred, i))
    
  
    
  
    
 	#return 0 if there are no values in the list
    
 	if len(precision_) == 0:
    
 		return 0 
    
  
    
  
    
 	#returning the average of all the precision values
    
 	return np.mean(precision_)
    
  
    
  
    
 #defining the values of the actual and the predicted class
    
 y_true = [[1,2,0,1], [0,4], [3], [1,2]]
    
 y_pred = [[1,1,0,1], [1,4], [2], [1,3]]
    
  
    
  
    
 if __name__ == "__main__":
    
 	for i in range(len(y_true)):
    
 		for j in range(1, 4):
    
 			print(
    
 				f"""
    
 				y_true = {y_true[i]}
    
 				y_pred = {y_pred[i]}
    
 				AP@{j} = {apatk(y_true[i], y_pred[i], k=j)}
    
 				"""
    
 			)
    
    
    
    
    代码解释

Here we check for the AP@k from 1 to 4. We get the following output.

在这里,我们检查从1到4的AP @ k。我们得到以下输出。

复制代码
 y_true = [1, 2, 0, 1]

    
 				y_pred = [1, 1, 0, 1]
    
 				AP@1 = 1.0
    
 				
    
  
    
  
    
 				y_true = [1, 2, 0, 1]
    
 				y_pred = [1, 1, 0, 1]
    
 				AP@2 = 0.75
    
 				
    
  
    
  
    
 				y_true = [1, 2, 0, 1]
    
 				y_pred = [1, 1, 0, 1]
    
 				AP@3 = 0.7222222222222222
    
 				
    
  
    
  
    
 				y_true = [0, 4]
    
 				y_pred = [1, 4]
    
 				AP@1 = 0.0
    
 				
    
  
    
  
    
 				y_true = [0, 4]
    
 				y_pred = [1, 4]
    
 				AP@2 = 0.25
    
 				
    
  
    
  
    
 				y_true = [0, 4]
    
 				y_pred = [1, 4]
    
 				AP@3 = 0.3333333333333333
    
 				
    
  
    
  
    
 				y_true = [3]
    
 				y_pred = [2]
    
 				AP@1 = 0.0
    
 				
    
  
    
  
    
 				y_true = [3]
    
 				y_pred = [2]
    
 				AP@2 = 0.0
    
 				
    
  
    
  
    
 				y_true = [3]
    
 				y_pred = [2]
    
 				AP@3 = 0.0
    
 				
    
  
    
  
    
 				y_true = [1, 2]
    
 				y_pred = [1, 3]
    
 				AP@1 = 1.0
    
 				
    
  
    
  
    
 				y_true = [1, 2]
    
 				y_pred = [1, 3]
    
 				AP@2 = 0.75
    
 				
    
  
    
  
    
 				y_true = [1, 2]
    
 				y_pred = [1, 3]
    
 				AP@3 = 0.6666666666666666
    
    
    
    
    代码解释

This gives us a clear understanding of how the code works.

这使我们对代码的工作方式有了清晰的了解。

K处的平均平均精度(MAP @ k): (Mean Average Precision at K (MAP@k):)

The mean of all AP@k values across the entire training dataset is referred to as MAP@k. This allows us to accurately represent the overall accuracy of predicted outputs. Below, you'll find the code corresponding to this approach.

在整个训练过程中取AP @ k所有值求平均所得的结果被称为MAP @ k这一指标。它能够精确反映预测数据的整体准确性水平。这些代码内容完全相同。

The values range between 0 and 1.

取值范围是0到1。

复制代码
 import numpy as np

    
 import apk
    
  
    
  
    
 def mapk(acutal, pred, k):
    
  
    
  
    
 	#creating a list for storing the Average Precision Values
    
 	average_precision = []
    
 	#interating through the whole data and calculating the apk for each 
    
 	for i in range(len(acutal)):
    
 		average_precision.append(apk.apatk(acutal[i], pred[i], k))
    
  
    
  
    
 	#returning the mean of all the data
    
 	return np.mean(average_precision)
    
  
    
  
    
 #defining the values of the actual and the predicted class
    
 y_true = [[1,2,0,1], [0,4], [3], [1,2]]
    
 y_pred = [[1,1,0,1], [1,4], [2], [1,3]]
    
  
    
  
    
 if __name__ == "__main__":
    
     print(mapk(y_true, y_pred,3))
    
    
    
    
    代码解释

Running the above code, we get the output as follows.

运行上面的代码,我们得到的输出如下。

复制代码
4305555555555556
    
    代码解释

Here, the score is bad as the prediction set has many errors.

在此,由于预测集存在许多错误,因此评分很差。

F1-样本: (F1 — Samples:)

This metric computes the F1 score per instance in the dataset and subsequently computes its average. We will utilize sklearn's identical implementation within our code.

该评估标准测量数据集中每一个样本的F1分数,并随后求取这些分数的平均值。代码中将采用与sklearn相同的计算方法。

Here is a link to the documentation for F1 Scores: Here. The scores typically fall within a range from 0 to 1.

此部分属于F1分数的官方文档。其取值区间为0到1。

The data is transformed into a binary format, which are then used to perform the f1 operation on these values. These calculations yield the necessary output values for our purpose.

通过将数据转换为二进制格式,随后对其进行f1运算。这使我们获得了所需的结果。

复制代码
 from sklearn.metrics import f1_score

    
 from sklearn.preprocessing import MultiLabelBinarizer
    
  
    
  
    
 def f1_sampled(actual, pred):
    
     #converting the multi-label classification to a binary output
    
     mlb = MultiLabelBinarizer()
    
     actual = mlb.fit_transform(actual)
    
     pred = mlb.fit_transform(pred)
    
  
    
  
    
     #fitting the data for calculating the f1 score 
    
     f1 = f1_score(actual, pred, average = "samples")
    
     return f1
    
  
    
  
    
 #defining the values of the actual and the predicted class
    
 y_true = [[1,2,0,1], [0,4], [3], [1,2]]
    
 y_pred = [[1,1,0,1], [1,4], [2], [1,3]]
    
  
    
  
    
 if __name__ == "__main__":
    
     print(f1_sampled(y_true, y_pred))
    
    
    
    
    代码解释

The output of the following code will be the following:

以下代码的输出如下:

复制代码
45
    
    代码解释

It is known that the F1 score ranges from 0 to 1, with our calculation yielding a value of 0.45. The reason for this discrepancy is due to an ineffective prediction set. If an improved prediction set were available, the resulting F1 score would approach closer to its maximum possible value.

已知F1分数在0到1之间运行,在本次评估中获得了0.45的得分。该得分较低的原因在于预测集的质量不高。若能优化预测集使其质量更优,则预期的F1分数将显著提升至接近1。

Therefore, when addressing this issue, it is conventional to employ metrics such as Mean Average Precision at K, F1 Score, and Log Loss. Consequently, establishing appropriate metrics tailored to your specific problem becomes essential.

因此,在针对该问题时,我们常常用K值、F1分数以及对数损失的平均值来评估模型性能。 以便根据您的需求选择合适的评估指标。

I want to express my gratitude to Abhishek for his book, Approaching (Any) Machine Learning Problem, which would not have made this blog possible without it.

我想表示衷心的感谢给Abhishek所著的书籍《接近(任何)机器学习问题》,因为如果没有这本书的话,我就不会有这篇博客.

翻译自: Analytics Vidhya 的文章 《多标签分类指标》

在多标签分类中评估模型性能时使用的几个关键指标包括准确率、精确率、召回率和F1分数。

准确率衡量的是模型正确预测正负类样本的比例。

精确率则表示正确地被预测为正类的样本数量与所有被预测为正类的样本数量的比例。

召回率则是正确地被预测为正类的样本数量与所有实际存在的正类样本数量的比例。

F1分数是精确率和召回率的调和平均数,在平衡精确性和召回率方面起着重要作用。

多标签分类 评价指标

全部评论 (0)

还没有任何评论哟~