多标签分类评价指标_多标签分类指标

阅读量：

多标签分类评价指标

度量在机器学习或深度学习领域扮演着至关重要的角色。我们从度量的选择入手，以便了解特定模型的基准分数。本文将深入探讨多标签分类中最佳和最常用的度量方法，并分析其与常规度量的不同之处。

在机器学习与深度学习领域中，指标扮演着至关重要的角色。我们从度量的选择入手，旨在探究特定模型的基础性能水平。在本博客中，我们深入探讨了多标签分类任务中的最佳与常用指标，并对这些指标相较于常规指标有哪些独特之处进行了详细分析。

I should delve into the concept of Multi-Label Classification, just in case it proves useful for you. If one has data on a dog's features, they are aimed at predicting both its breed and pet category.

希望了解什么是多标签分类？以防万一您需要用到它。如果狗的特征数据存在，则必须预测它属于哪些品种以及是否是宠物。

In the context of Object-Detection, Multi-Label Categorization yields a list of all objects present in an image. As demonstrated, the classifier identifies three objects within the image. This can be represented as a binary vector [1 0 1 1], assuming four distinct trained categories: dog, human, bicycle, and truck.

在物体检测的情境下（或情况），多标签分类算法能够为我们提供一张图中所有物体的列表（或呈现）。通过该分类器我们可以识别出图像中的三个物体（或三个类别）。当训练数据集包含4个类别时，则该列表包括：狗、人、自行车和卡车（或具体类别名称）。

Object Detection output 目标检测输出

This kind of classification is known as Multi-Label Classification.

这种分类称为多标签分类。

The primary and frequently employed metrics in Multi-Label Classification demonstrate their effectiveness in evaluating various aspects of the model. They are typically utilized to assess:

用于多标签分类的最常见指标如下：

Precision at k

k精度

Avg precision at k

平均精度(k)

Mean avg precision at k

k的平均平均精度

Sampled F1 Score

采样的F1分数

Let’s get into the details of these metrics.

让我们详细了解这些指标。

k精度(P @ k)： (Precision at k (P@k):)

Given a list of actual categories and predicted categories, precision at position $k$ is calculated as the count of accurate predictions among the top $k$ items in each category divided by $k$ . The resulting values will always fall within a range from 0 to 1.

基于真实类别与预测类别的列表中，在位置k处的精确度计算方法是：对于每个类别，在其前k个元素中被正确识别的数量占总数量的比例。其取值范围为0至1。

Here is an example as explaining the same in code:

这是一个解释相同代码的示例：

复制代码

 def patk(actual, pred, k):

    
 	#we return 0 if k is 0 because 
    
 	#   we can't divide the no of common values by 0 
    
 	if k == 0:
    
 		return 0
    
  
    
  
    
 	#taking only the top k predictions in a class 
    
 	k_pred = pred[:k]
    
  
    
  
    
 	#taking the set of the actual values 
    
 	actual_set = set(actual)
    
  
    
  
    
 	#taking the set of the predicted values 
    
 	pred_set = set(k_pred)
    
  
    
  
    
 	#taking the intersection of the actual set and the pred set
    
 		# to find the common values
    
 	common_values = actual_set.intersection(pred_set)
    
  
    
  
    
 	return len(common_values)/len(pred[:k])
    
  
    
  
    
 #defining the values of the actual and the predicted class
    
 y_true = [1 ,2, 0]
    
 y_pred = [1, 1, 0]
    
  
    
  
    
 if __name__ == "__main__":
    
     print(patk(y_true, y_pred,3))
    
    
    
    
    代码解释

Running the following code, we get the following result.

运行以下代码，我们得到以下结果。

复制代码

6666666666666666
    
    代码解释

In this case, we got the value of 2 as 1, thus resulting in the score going down.

在这种情况下，我们将2的值设为1，从而导致得分下降。

K处的平均精度(AP @ k)： (Average Precision at K (AP@k):)

Known as the average of precision values at position k, this metric is calculated by taking the mean across all positions from 1 to N. To better understand how this works, let's examine a piece of code that implements it. Such metrics typically fall within a range from 0 to 1.

它被称为k从1到k时各点精度值的平均数。为了更好地理解这一概念，请观察一些示例代码。其取值区间介于0和1之间。

复制代码

 import numpy as np

    
 import pk
    
  
    
  
    
 def apatk(acutal, pred, k):
    
 	#creating a list for storing the values of precision for each k 
    
 	precision_ = []
    
 	for i in range(1, k+1):
    
 		#calculating the precision at different values of k 
    
 		#      and appending them to the list 
    
 		precision_.append(pk.patk(acutal, pred, i))
    
  
    
  
    
 	#return 0 if there are no values in the list
    
 	if len(precision_) == 0:
    
 		return 0 
    
  
    
  
    
 	#returning the average of all the precision values
    
 	return np.mean(precision_)
    
  
    
  
    
 #defining the values of the actual and the predicted class
    
 y_true = [[1,2,0,1], [0,4], [3], [1,2]]
    
 y_pred = [[1,1,0,1], [1,4], [2], [1,3]]
    
  
    
  
    
 if __name__ == "__main__":
    
 	for i in range(len(y_true)):
    
 		for j in range(1, 4):
    
 			print(
    
 				f"""
    
 				y_true = {y_true[i]}
    
 				y_pred = {y_pred[i]}
    
 				AP@{j} = {apatk(y_true[i], y_pred[i], k=j)}
    
 				"""
    
 			)
    
    
    
    
    代码解释

Here we check for the AP@k from 1 to 4. We get the following output.

在这里，我们检查从1到4的AP @ k。我们得到以下输出。

复制代码

 y_true = [1, 2, 0, 1]

    
 				y_pred = [1, 1, 0, 1]
    
 				AP@1 = 1.0
    
 				
    
  
    
  
    
 				y_true = [1, 2, 0, 1]
    
 				y_pred = [1, 1, 0, 1]
    
 				AP@2 = 0.75
    
 				
    
  
    
  
    
 				y_true = [1, 2, 0, 1]
    
 				y_pred = [1, 1, 0, 1]
    
 				AP@3 = 0.7222222222222222
    
 				
    
  
    
  
    
 				y_true = [0, 4]
    
 				y_pred = [1, 4]
    
 				AP@1 = 0.0
    
 				
    
  
    
  
    
 				y_true = [0, 4]
    
 				y_pred = [1, 4]
    
 				AP@2 = 0.25
    
 				
    
  
    
  
    
 				y_true = [0, 4]
    
 				y_pred = [1, 4]
    
 				AP@3 = 0.3333333333333333
    
 				
    
  
    
  
    
 				y_true = [3]
    
 				y_pred = [2]
    
 				AP@1 = 0.0
    
 				
    
  
    
  
    
 				y_true = [3]
    
 				y_pred = [2]
    
 				AP@2 = 0.0
    
 				
    
  
    
  
    
 				y_true = [3]
    
 				y_pred = [2]
    
 				AP@3 = 0.0
    
 				
    
  
    
  
    
 				y_true = [1, 2]
    
 				y_pred = [1, 3]
    
 				AP@1 = 1.0
    
 				
    
  
    
  
    
 				y_true = [1, 2]
    
 				y_pred = [1, 3]
    
 				AP@2 = 0.75
    
 				
    
  
    
  
    
 				y_true = [1, 2]
    
 				y_pred = [1, 3]
    
 				AP@3 = 0.6666666666666666
    
    
    
    
    代码解释

This gives us a clear understanding of how the code works.

这使我们对代码的工作方式有了清晰的了解。

K处的平均平均精度(MAP @ k)： (Mean Average Precision at K (MAP@k):)

The mean of all AP@k values across the entire training dataset is referred to as MAP@k. This allows us to accurately represent the overall accuracy of predicted outputs. Below, you'll find the code corresponding to this approach.

在整个训练过程中取AP @ k所有值求平均所得的结果被称为MAP @ k这一指标。它能够精确反映预测数据的整体准确性水平。这些代码内容完全相同。

The values range between 0 and 1.

取值范围是0到1。

复制代码

 import numpy as np

    
 import apk
    
  
    
  
    
 def mapk(acutal, pred, k):
    
  
    
  
    
 	#creating a list for storing the Average Precision Values
    
 	average_precision = []
    
 	#interating through the whole data and calculating the apk for each 
    
 	for i in range(len(acutal)):
    
 		average_precision.append(apk.apatk(acutal[i], pred[i], k))
    
  
    
  
    
 	#returning the mean of all the data
    
 	return np.mean(average_precision)
    
  
    
  
    
 #defining the values of the actual and the predicted class
    
 y_true = [[1,2,0,1], [0,4], [3], [1,2]]
    
 y_pred = [[1,1,0,1], [1,4], [2], [1,3]]
    
  
    
  
    
 if __name__ == "__main__":
    
     print(mapk(y_true, y_pred,3))
    
    
    
    
    代码解释

Running the above code, we get the output as follows.

运行上面的代码，我们得到的输出如下。

复制代码

4305555555555556
    
    代码解释

Here, the score is bad as the prediction set has many errors.

在此，由于预测集存在许多错误，因此评分很差。

F1-样本： (F1 — Samples:)

This metric computes the F1 score per instance in the dataset and subsequently computes its average. We will utilize sklearn's identical implementation within our code.

该评估标准测量数据集中每一个样本的F1分数，并随后求取这些分数的平均值。代码中将采用与sklearn相同的计算方法。

Here is a link to the documentation for F $1$ Scores: Here. The scores typically fall within a range from $0$ to $1$ .

此部分属于F1分数的官方文档。其取值区间为0到1。

The data is transformed into a binary format, which are then used to perform the f1 operation on these values. These calculations yield the necessary output values for our purpose.

通过将数据转换为二进制格式，随后对其进行f1运算。这使我们获得了所需的结果。

复制代码

 from sklearn.metrics import f1_score

    
 from sklearn.preprocessing import MultiLabelBinarizer
    
  
    
  
    
 def f1_sampled(actual, pred):
    
     #converting the multi-label classification to a binary output
    
     mlb = MultiLabelBinarizer()
    
     actual = mlb.fit_transform(actual)
    
     pred = mlb.fit_transform(pred)
    
  
    
  
    
     #fitting the data for calculating the f1 score 
    
     f1 = f1_score(actual, pred, average = "samples")
    
     return f1
    
  
    
  
    
 #defining the values of the actual and the predicted class
    
 y_true = [[1,2,0,1], [0,4], [3], [1,2]]
    
 y_pred = [[1,1,0,1], [1,4], [2], [1,3]]
    
  
    
  
    
 if __name__ == "__main__":
    
     print(f1_sampled(y_true, y_pred))
    
    
    
    
    代码解释

The output of the following code will be the following:

以下代码的输出如下：

复制代码

45
    
    代码解释

It is known that the F1 score ranges from 0 to 1, with our calculation yielding a value of 0.45. The reason for this discrepancy is due to an ineffective prediction set. If an improved prediction set were available, the resulting F1 score would approach closer to its maximum possible value.

已知F1分数在0到1之间运行，在本次评估中获得了0.45的得分。该得分较低的原因在于预测集的质量不高。若能优化预测集使其质量更优，则预期的F1分数将显著提升至接近1。

Therefore, when addressing this issue, it is conventional to employ metrics such as Mean Average Precision at K, F1 Score, and Log Loss. Consequently, establishing appropriate metrics tailored to your specific problem becomes essential.

因此，在针对该问题时，我们常常用K值、F1分数以及对数损失的平均值来评估模型性能。以便根据您的需求选择合适的评估指标。

I want to express my gratitude to Abhishek for his book, Approaching (Any) Machine Learning Problem, which would not have made this blog possible without it.

我想表示衷心的感谢给Abhishek所著的书籍《接近(任何)机器学习问题》，因为如果没有这本书的话,我就不会有这篇博客.

翻译自: Analytics Vidhya 的文章《多标签分类指标》

在多标签分类中评估模型性能时使用的几个关键指标包括准确率、精确率、召回率和F1分数。

准确率衡量的是模型正确预测正负类样本的比例。

精确率则表示正确地被预测为正类的样本数量与所有被预测为正类的样本数量的比例。

召回率则是正确地被预测为正类的样本数量与所有实际存在的正类样本数量的比例。

F1分数是精确率和召回率的调和平均数，在平衡精确性和召回率方面起着重要作用。

多标签分类评价指标

全部评论 (0)

还没有任何评论哟~

多标签分类评价指标_多标签分类指标

多标签分类评价指标 MetricsplayquiteanimportantroleinthefieldofMachineLearningorDeepLearning.Westarttheproblem...

多标签分类的评价指标

\quad当前，已有大量的有关多标签分类的评价指标（evaluationmetrics）。一般而言，可以分为两大类：（1）一是称为documentpivoted（也可以称为instancebased或...

多标签评价指标

1\.符号系统记号含义 X\mathcalXdd维实例空间Rd\mathbbR^d或Zd\mathbbZ^d Y\mathcalY标签空间，有LL种标签y1,y2,⋯,yL\y1,y2,\cdots...

python多标签分类的评价指标_多分类多标签模型的评估方式（定义+numpy代码实现）...

一、MultiClassMultiLabel问题定义所谓多分类MultiClass是区别于二分类的一个概念，在二分类问题当中，数据的标签只是0，1二值类型，比如“是否”是一只狗，“是否”患病。

python按标签分类切分数据_python实现多分类评价指标

1、什么是多分类？针对多类问题的分类中，具体讲有两种，即multiclassclassification和multilabelclassification。multiclass是指分类任务中包含不止...

多标签分类_大规模多标签分类

介绍自然语言处理中有一项任务叫做大规模多标签分类（ExtremeMultiLabelClassification，XML）。给定一段文本，和大量的标签（千、万、十万、百万数量级），目标是输出这段文...

bert 是单标签还是多标签的分类_BERT多标签分类

过去的一年里，深度神经网络开创了自然语言处理的激动人心的时代。使用预训练模型的领域的研究已经导致许多NLP任务的最新结果的巨大飞跃，例如文本分类，自然语言推理和问答。一些关键的里程碑是ELMo，ULM...

多标签分类_分类问题多标签（multilabel）、多类别（multiclass）

大致上，解决multilabel的方法有两种大致上，解决multilabel的方法有两种 1转化问题。把问题转化为一个或多个单目标分类问题，或是回归问题。 2算法适应。修改学习算法使得能直接处理mu...

One-Error多标签分类_多分类及多标签分类算法

一、单标签多分类 1、单标签二分类算法原理 1、单标签二分类这种问题是我们最常见的算法问题，主要是指label标签的取值只有两种，并且算法中只有一个需要预测的label标签；直白来讲就是每个实例的可...

多标签分类中的损失函数与评估指标

各位朋友大家好，欢迎来到月来客栈。由于公众号推文不支持后续修订，所以本文将同步推送至网站www.ylkz.life，欢迎大家关注！ 1引言在前面的一篇文章[1]中笔者介绍了在单标签分类问题中模型损失...

是否确定退出登录?

多标签分类 评价指标_多标签分类指标

k精度(P @ k)： (Precision at k (P@k):)

K处的平均精度(AP @ k)： (Average Precision at K (AP@k):)

K处的平均平均精度(MAP @ k)： (Mean Average Precision at K (MAP@k):)

F1-样本： (F1 — Samples:)

全部评论 (0)

相关文章推荐

多标签分类 评价指标_多标签分类指标

多标签分类的评价指标

多标签评价指标

python多标签分类的评价指标_多分类多标签模型的评估方式（定义+numpy代码实现）...

python按标签分类切分数据_python实现多分类评价指标

多标签分类_大规模多标签分类

bert 是单标签还是多标签 的分类_BERT多标签分类

多标签分类_分类问题多标签（multilabel）、多类别（multiclass）

One-Error多标签分类_多分类及多标签分类算法

多标签分类中的损失函数与评估指标

多标签分类评价指标_多标签分类指标

多标签分类评价指标_多标签分类指标

bert 是单标签还是多标签的分类_BERT多标签分类