What Is Transfer Learning and Why Should I Care?
作者:禅与计算机程序设计艺术
1.简介
在深度学习领域,迁移学习是一项备受关注的研究主题,它允许我们从一个任务的知识中转移到另一个相关任务,从而显著减少了所需标注数据的数量,同时提升了对这两个任务的泛化性能。然而,这一技术的工作原理及其有效性尚不完全明确。本文旨在探讨迁移学习的基本概念,并阐述其作为构建复杂模型的有效工具的重要性。我们将深入讨论几种迁移学习的方法,包括特征提取、微调和多任务学习,并详细解释它们的运行机制及其优势。此外,我们将探讨在实际应用中可能遇到的挑战,如模型收敛性和过拟合问题,并提出潜在的解决方案,如域适应技术和无监督预训练方法。通过这些讨论,我们希望为读者提供一个全面的理解,从而进一步探索其在各种应用中的潜力和发展方向。
2.基本概念和术语
2.1 机器学习
机器学习(ML)是人工智能领域的一个子领域,它允许计算机无需显式编程即可自主学习。其目标是通过分析历史经验来使机器识别模式并预测新数据。机器学习算法利用统计建模来揭示输入与输出之间的关系。机器学习中常见的三类典型问题包括分类、回归和聚类。分类问题通过分析输入特征预测离散结果,而回归问题则估计连续数值结果。聚类问题则通过分析输入特征将相似样例分组。尽管面临不同的挑战和需求,但机器学习算法在多个行业中展现了巨大的潜力,成功解决了众多实际问题。流行框架包括TensorFlow、PyTorch、Scikit-learn和Keras。
2.2 深度学习
Deep learning (DL) represents a specialized branch of machine learning that employs neural networks to address intricate challenges. Neural networks are structured as layers of interconnected nodes, each of which receives input from other nodes and produces output through weights assigned during the training process. Neural networks are designed to imitate the manner in which human brains operate, consisting of millions of parallel processes working in tandem to execute complex tasks. Deep learning has been successfully applied to a wide array of tasks, spanning from image recognition to natural language processing, speech recognition, and translation. Among the most widely used DL libraries are Tensorflow, Pytorch, and Keras.
2.3 迁移学习
Transfer learning represents a machine learning strategy where a pre-trained model serves as the foundation for developing a target model. The central concept of transfer learning lies in transferring knowledge obtained from addressing one issue to enhance performance on a related yet distinct task. Such as, when aiming to classify images of animals, one might begin with a substantial dataset of dog images and fine-tune a convolutional neural network (CNN) tailored to your specific classification task. Transfer learning proves especially valuable when the two tasks share commonalities, i.e., they are highly alike or exhibit minor distinctions. This characteristic makes transfer learning highly effective for applications involving massive datasets and intricate challenges.
2.4 特征提取、微调、多任务学习
There are three main transfer learning strategies:
- Feature Extraction: Instead of directly training the last layer(s) of a model on the target task, we freeze all layers except those that are necessary for the desired output and extract features from the remaining layers. These extracted features could be fed into a new fully connected layer or softmax classifier for prediction.
- Fine-Tuning: Whereas conventionally trained CNNs are optimized for accuracy, transfer learning enables us to focus on just a few layers of the network instead. We can unlock these layers by freezing them, retrain only the top layers on our target task, and continue training with a lower learning rate. This strategy can help speed up convergence, reduce overfitting, and enable us to build highly accurate models quickly.
- Multi-Task Learning: One issue with traditional CNNs is that they require fixed sized input images and therefore may struggle with handling variations in size and aspect ratio within the same batch. To address this challenge, we can train multiple independent tasks simultaneously using separate datasets. Each task will receive its own fully connected head and learns to optimize itself independently, leading to better generalization performance than single-headed models trained on the entire dataset jointly.
3.核心算法原理和具体操作步骤
In brief, transfer learning represents a potent strategy that integrates the strengths of deep learning and machine learning. By borrowing from prior knowledge, we can minimize the time and resources typically invested in training new models from scratch, instead focusing on enhancing specific components of existing models. Various methods exist to implement transfer learning, each presenting unique benefits and drawbacks. In this article, we will delve into each strategy in detail and illustrate their practical application through Python code snippets.
3.1 特征提取
Feature extraction involves the process of obtaining features from a pre-trained model for application in a new task. Initially, the pre-trained model is loaded, and all layers except those essential for the desired output are fixed. Subsequently, a sample undergoes forward propagation through these fixed layers to generate intermediate representations, commonly referred to as "features." The output layer is subsequently omitted since it is no longer required. These features can now be employed as input to a new fully connected layer or a softmax-based classifier for prediction purposes.
This section presents an implementation of feature extraction, utilizing VGG16 as a pre-trained model within the Keras framework.
from keras.applications import VGG16
from keras.layers import Dense, Flatten
# Load pre-trained VGG16 model
vgg_model = VGG16(weights='imagenet', include_top=False, input_shape=(img_width, img_height, 3))
# Freeze layers except for last four layers
for layer in vgg_model.layers[:-4]:
layer.trainable = False
# Define custom FC layers
fc1 = Flatten()(vgg_model.output)
fc2 = Dense(128, activation='relu')(fc1)
predictions = Dense(num_classes, activation='softmax')(fc2)
# Create custom model
custom_model = Model(inputs=vgg_model.input, outputs=predictions)
代码解读
The include_top parameter determines whether to include the default top layers of a pre-trained model. When set to True, the model outputs a classification layer with 1000 neurons corresponding to ImageNet categories. If include_top is set to False, the top layers are omitted, resulting in intermediate representations that are ideal for feature extraction. By setting include_top=False, we can save memory and computation time, as this prevents unnecessary gradient calculations propagated through all previous layers. It's important to note that Keras offers various pre-trained models, so you might need to adjust the architecture slightly depending on your specific model choice.
Upon defining the custom model, we can proceed to compile it using the categorical_crossentropy loss function and a suitable optimizer in a standard manner. Training typically proceeds normally until achieving convergence. Afterward, evaluation of the model on a test set is typically conducted to assess its performance metrics, including accuracy, precision, recall, and F1 score, to determine its effectiveness on the target task.
3.2 微调
Fine-tuning represents a methodology that specifically fine-tunes the upper layers of a pre-trained model for a particular task, while keeping the lower layers intact. This approach helps enhance model convergence and mitigate overfitting. By employing a pre-trained model, we can specify which layers to preserve and which to modify, then retrain the adjusted layers to optimize performance on the target task. The resultant model will benefit from the pre-trained layers’ knowledge, thereby facilitating more efficient achievement of optimal parameter convergence.
This implementation utilizes ResNet50 as a pre-trained model within the Keras framework.
from keras.applications import ResNet50
from keras.models import Sequential
from keras.layers import Dense, Flatten
# Load pre-trained ResNet50 model
resnet_model = ResNet50(weights='imagenet', include_top=False, input_shape=(img_width, img_height, 3))
# Freeze all layers except for the last block of layers
for layer in resnet_model.layers[:-10]:
layer.trainable = False
# Add custom top layers for the target task
custom_model = Sequential()
custom_model.add(Flatten(input_shape=resnet_model.output_shape[1:]))
custom_model.add(Dense(128, activation='relu'))
custom_model.add(Dropout(0.5))
custom_model.add(Dense(num_classes, activation='softmax'))
# Compile custom model
custom_model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
# Train custom model
history = custom_model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs)
代码解读
Specifically, we excluded the original classifier layers of the pre-trained model by setting include_top=False. To begin with, we defined a custom sequential model by first flattening the final representation of the pre-trained model, followed by the incorporation of dense hidden layers and dropout regularization. Finally, we incorporated a softmax output layer to complete the classification process. During compilation, we utilized the Adam optimizer along with the categorical cross-entropy loss function, and incorporated the accuracy metric for evaluation purposes.
After training, the model can be assessed on a test set to determine if the improvement aligns with our expectations.
While fine-tuning cannot ensure enhanced performance across all tasks, particularly when the original pre-trained model was designed for a different objective, practical application often necessitates experimenting with various hyperparameter configurations and architectural setups to identify the optimal solution tailored to each specific task.
3.3 多任务学习
Multi-task learning is a method of training a model on multiple unrelated tasks simultaneously. This approach can significantly improve generalization performance, since each task provides information to the overall model rather than relying mainly on a single source. Each task accesses its own dedicated head of the model and independently optimizes its own performance.
As an example, consider a face emotion detection model that needs to identify facial features and emotional expressions in photographs. Instead of training the model separately on two datasets—one focused on faces and facial expressions and another dedicated to emotional expressions—we can train it on a combined dataset that includes both types of information. During inference, the model will generate probability scores for each task, enabling it to make accurate and reliable conclusions about both the person's facial features and emotional state.
This represents a realization of multi-task learning, achieved through the use of Xception as a pre-trained convolutional neural network architecture in Keras.
from keras.applications import Xception
from keras.models import Sequential
from keras.layers import GlobalAveragePooling2D, Dense, Dropout
# Load pre-trained Xception model
xception_model = Xception(weights='imagenet', include_top=False, input_shape=(img_width, img_height, 3))
# Freeze layers except for last six blocks of layers
for layer in xception_model.layers[:-27]:
layer.trainable = False
# Add custom heads for the target tasks
face_head = Sequential([GlobalAveragePooling2D(),
Dense(128, activation='relu'),
Dropout(0.5),
Dense(1, activation='sigmoid')])
emotion_head = Sequential([GlobalAveragePooling2D(),
Dense(64, activation='relu'),
Dropout(0.5),
Dense(num_emotions, activation='softmax')])
# Combine pre-trained and custom heads into single model
combined_model = Sequential()
combined_model.add(xception_model)
combined_model.add(face_head)
combined_model.add(emotion_head)
# Compile combined model
combined_model.compile(optimizer='adam',
loss={'face': 'binary_crossentropy',
'emotion':'sparse_categorical_crossentropy'},
loss_weights={'face': 0.2,
'emotion': 0.8},
metrics={'face': ['accuracy'],
'emotion': ['accuracy']})
# Train combined model
history = combined_model.fit({'face': X_faces_train,
'emotion': X_emotions_train},
{'face': y_faces_train,
'emotion': y_emotions_train},
validation_data=[{'face': X_faces_test,
'emotion': X_emotions_test},
{'face': y_faces_test,
'emotion': y_emotions_test}],
epochs=epochs)
代码解读
In this case, we rehydrate the pre-trained Xception model and lock all layers apart from the final six block layers, thereby granting access to the terminal global average pooling layers within each block. We engineer two specialized heads tailored for our two distinct target tasks: one optimized for face detection and another for emotion analysis. These heads are appended to the pre-trained structure as additional layers, subsequently merged into a unified model using the Keras functional API.
We train the model using binary cross-entropy loss for face detection and sparse categorical cross-entropy loss for emotion recognition, assigning weighting coefficients of 0.2 and 0.8 respectively to ensure each attention head's contributions are balanced. We also incorporate the accuracy metric into the evaluation process. Finally, we employ mini-batch gradient descent to train the combined model across both labeled datasets.
During training, we track the progress of both heads via separate evaluation metrics. Upon obtaining convergence, we can evaluate the full model on a test set to assess its overall quality.
Similar to fine-tuning, multi-task learning may not consistently achieve significant improvements due to commonalities between the two tasks. It might require selecting an optimal balance of tasks and hyperparameters to attain good results.
4.具体代码实例及解释说明
In addition to explaining fundamental concepts and technical aspects, we will also demonstrate practical implementations of feature extraction, fine-tuning, and multi-task learning using widely-used deep learning frameworks such as Keras and PyTorch. These demonstrations will highlight the essential syntax and application of each technique, offering valuable guidance for implementing similar approaches in your own projects.
Let's start by importing the required libraries and defining necessary variables.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
# Set random seed for reproducibility
np.random.seed(42)
# Set number of classes and samples per class
num_classes = 2
samples_per_class = 100
# Generate synthetic dataset for demonstration purposes
X, y = make_blobs(n_samples=num_classes*samples_per_class, centers=num_classes, n_features=2, cluster_std=2, random_state=42)
y = to_categorical(y)
shuffle_idx = np.arange(len(y))
np.random.shuffle(shuffle_idx)
X, y = X[shuffle_idx], y[shuffle_idx]
X_train, y_train = X[:900], y[:900]
X_val, y_val = X[900:], y[900:]
代码解读
This code generates a synthetic dataset of blobs with clustered variance, which are divided by concentric circles. Following this, the dataset is randomly shuffled and divided into a training set (X_train and y_train) and a validation set (X_val and y_val).
Now let's generate plots to visualize our synthetic dataset:
fig, ax = plt.subplots(figsize=(8, 8))
ax.scatter(X[:, 0], X[:, 1], c=y.argmax(axis=-1), cmap="jet")
ax.set_xlabel("Feature 1")
ax.set_ylabel("Feature 2");
代码解读
Our dataset presents a compelling outcome. Moving forward, we will focus on feature extraction, fine-tuning, and multi-task learning, employing Keras and PyTorch in sequence.
4.1 Keras 中的特征提取
Keras 提供的最基础的特征提取方案主要依赖于基于 VGG 的预训练模型。在 Keras 库中,我们可以轻松地导入 VGG16 模型,并选择冻结其最后四层权重以防止更新。接着,我们就可以根据需要构建自定义的全连接层来完成分类任务。
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Input, Flatten, Dense
# Load pre-trained VGG16 model
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(img_width, img_height, 3))
# Freeze layers except for last four layers
for layer in base_model.layers[:-4]:
layer.trainable = False
# Define custom FC layers
inputs = Input(shape=(img_width, img_height, 3))
x = base_model(inputs)
x = Flatten()(x)
outputs = Dense(num_classes, activation='softmax')(x)
# Create custom model
model = tf.keras.Model(inputs=inputs, outputs=outputs)
代码解读
基于 TensorFlow 库进行示例,切换起来非常便捷。
我们还可以通过加载预训练的 ResNet50 模型实现同样的功能:
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Flatten, Dense
# Load pre-trained ResNet50 model
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(img_width, img_height, 3))
# Freeze all layers except for the last block of layers
for layer in base_model.layers[:-10]:
layer.trainable = False
# Add custom top layers for the target task
inputs = Input(shape=(img_width, img_height, 3))
x = base_model(inputs)
x = Flatten()(x)
outputs = Dense(num_classes, activation='softmax')(x)
# Create custom model
model = Model(inputs=inputs, outputs=outputs)
代码解读
生成的模型将输出一个 num_classes-维向量,该向量精确地表示输入图像在各个类别上的概率分布。
4.2 PyTorch 中的特征提取
PyTorch集成了两种先进的特征提取方案。其中一种是基于经典的VGG网络构建的预训练模型,另一种则是巧妙地结合了视觉注意力机制。
PyTorch集成了两种先进的特征提取方案。其中一种是基于经典的VGG网络构建的预训练模型,另一种则是巧妙地结合了视觉注意力机制。
方法一——基于 VGG 的预训练模型
通过 PyTorch 的 torchvision.models 包中的 vgg16() 函数,我们可以获取基于 VGG 的预训练模型。对于基于 ResNet 的模型,同样地,我们也可以调用对应的函数获取其预训练版本。
import torch
import torchvision.models as models
# Load pre-trained VGG16 model
vgg16 = models.vgg16(pretrained=True).to('cuda')
# Freeze layers except for last four layers
for param in vgg16.parameters():
param.requires_grad_(False)
# Replace last layer with custom FC layer for classification
classifier = nn.Sequential(nn.Linear(in_features=4096, out_features=256, bias=True),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(in_features=256, out_features=num_classes, bias=True)).to('cuda')
vgg16.classifier[-1] = classifier
代码解读
这里我们用到了 CUDA 库,它可以让我们在 GPU 上运行神经网络模型。
方法二——视觉注意机制
视觉注意力机制(Visual Attention Mechanism)基于强化学习的方法,使智能体能够识别图像中的对象,并根据识别对象的位置和形状生成相应的奖励或惩罚信号。这种机制有助于智能体更准确地理解世界,并更高效地做出决策。
通过PIL库,我们可以采用读取图像文件并对其进行预处理操作的方法。
import PIL.Image
import torchvision.transforms as transforms
# Read image file
# Preprocess image
preprocess = transforms.Compose([
transforms.Resize(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(image)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model
# Move input and model to GPU for speed if available
if torch.cuda.is_available():
input_batch = input_batch.to('cuda')
vgg16.to('cuda')
with torch.no_grad():
output = vgg16(input_batch)
probs = torch.nn.functional.softmax(output[0], dim=0)
# Output top predicted labels and probabilities
_, indices = probs.sort(dim=0, descending=True)
percentage = torch.nn.functional.softmax(output[0][indices]).tolist()[::-1][:5]*100
labels = [imagenet_classes[idx] for idx in indices.tolist()]
print([(label, prob, percentage_) for label, prob, percentage_ in zip(labels, probs, percentage)])
代码解读
这段代码可以输出识别到的前 5 个物体的标签名称、概率值、以及所占比例。
4.3 Keras 中的微调
这一技术具有重要意义,能够有效促进神经网络模型在新任务中的应用。
为实现猫狗识别任务,我们有一个目标需要进行检测。为此,我们已建立一个基于VGG架构的预训练模型,用于图片分类任务。那么,针对猫狗识别任务,我们只需调整最后几层全连接层的参数配置,即可完成模型的微调。
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Input, Flatten, Dense, Dropout
# Load pre-trained VGG16 model
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(img_width, img_height, 3))
# Freeze all layers except for the last five blocks
for layer in base_model.layers[:-5]:
layer.trainable = False
# Add custom top layers for the target task
inputs = Input(shape=(img_width, img_height, 3))
x = base_model(inputs)
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
outputs = Dense(num_classes, activation='softmax')(x)
# Create custom model
model = tf.keras.Model(inputs=inputs, outputs=outputs)
# Compile custom model
model.compile(loss='categorical_crossentropy',
optimizer=SGD(lr=0.001, momentum=0.9),
metrics=['accuracy'])
# Continue training custom model on new task
history = model.fit(X_train, y_train,
validation_data=(X_val, y_val),
epochs=20)
代码解读
无需更改底层卷积层的参数,这些参数基于ImageNet的预训练特性而无需更改,我们只需在顶部添加几个全连接层进行微调即可。进一步训练模型,以评估其对新增目标任务性能的提升潜力。
4.4 PyTorch 中的微调
与 Keras 中的微调机制相似,PyTorch 通过冻结除最后几层参数外的所有层参数,并在模型顶部添加一组新的全连接层,实现模型的微调。
import torch.optim as optim
import torch.nn as nn
import torchvision.models as models
# Load pre-trained VGG16 model
vgg16 = models.vgg16(pretrained=True).to('cuda')
# Freeze layers except for last four layers
for param in vgg16.parameters():
param.requires_grad_(False)
# Replace last layer with custom FC layer for classification
classifier = nn.Sequential(nn.Linear(in_features=4096, out_features=256, bias=True),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(in_features=256, out_features=num_classes, bias=True)).to('cuda')
vgg16.classifier[-1] = classifier
criterion = nn.CrossEntropyLoss().to('cuda')
optimizer = optim.SGD(params=vgg16.parameters(), lr=0.001, momentum=0.9)
# Continue training custom model on new task
for epoch in range(20):
running_loss = 0.0
num_correct = 0
# Iterate over data
for i, data in enumerate(dataloader, 0):
inputs, labels = data
inputs = Variable(inputs).to('cuda')
labels = Variable(labels).to('cuda')
# Zero the parameter gradients
optimizer.zero_grad()
# Forward + backward + optimize
outputs = vgg16(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# Statistics
running_loss += loss.item() * inputs.size(0)
num_correct += torch.sum(preds == labels.data)
print('[Epoch %d] Loss: %.3f | Acc: %.3f' %(epoch+1, running_loss / len(dataset), float(num_correct)/len(dataset)))
代码解读
在Keras与PyTorch之间存在显著差异。在Keras中,损失值的计算由库自动处理,而在PyTorch中则需要手动计算损失值。因此,我们需要定义损失函数和优化器,并且在迭代数据集时更新模型参数。
4.5 Keras 中的多任务学习
多任务学习(multi-task learning)作为机器学习的一种技术,在实际应用中展现出显著的潜力。该技术允许模型同时处理既相互关联又高度相关的任务。例如,一个模型可以同时识别物体及其对应的动作。
在Keras中,最直接的方案就是分别单独训练两个不同任务的顶层网络,然后将这两个网络进行整合。
from tensorflow.keras.layers import GlobalAveragePooling2D, Concatenate, Dense, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
# Load pre-trained VGG16 model
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(img_width, img_height, 3))
# Freeze layers except for last four layers
for layer in base_model.layers[:-4]:
layer.trainable = False
# Add custom heads for the target tasks
face_head = Sequential([GlobalAveragePooling2D(),
Dense(128, activation='relu'),
Dropout(0.5),
Dense(1, activation='sigmoid')])
emotion_head = Sequential([GlobalAveragePooling2D(),
Dense(64, activation='relu'),
Dropout(0.5),
Dense(num_emotions, activation='softmax')])
# Combine pre-trained and custom heads into single model
combined_model = Sequential()
combined_model.add(base_model)
combined_model.add(Concatenate())
combined_model.add(Dense(1024, activation='relu'))
combined_model.add(Dropout(0.5))
combined_model.add(Dense(num_classes, activation='softmax'))
# Compile combined model
combined_model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=0.001),
metrics=['accuracy'])
# Train combined model
history = combined_model.fit(X_train, {'face': y_faces_train, 'emotion': y_emotions_train},
validation_data=(X_val, {'face': y_faces_val, 'emotion': y_emotions_val}),
epochs=20)
代码解读
基于此,我们仅需两行代码即可实现模型构建工作。这样一来,模型将同时聚焦于人脸识别和表情识别两个任务。采用 fit() 方法,我们可以对这个模型进行训练。
4.6 PyTorch 中的多任务学习
与 Keras 中的多任务学习相似,PyTorch 也包含多种多任务学习的方法。然而,PyTorch 没有内置的支持功能,因此需要我们自己编写代码来实现这一功能。
在处理过程中,我们需要将两个不同任务的数据划分成不同的DataLoader,然后分别训练两个模型。在模型设计过程中,我们需要综合考虑以下因素:
- 共享的基网络;
- 每个任务的输出大小;
- 如何组合输出结果。
import torch.optim as optim
import torch.nn as nn
import torchvision.models as models
# Load pre-trained VGG16 model
vgg16 = models.vgg16(pretrained=True).to('cuda')
# Freeze layers except for last four layers
for param in vgg16.parameters():
param.requires_grad_(False)
# Replace last layer with custom heads for both tasks
face_head = nn.Sequential(nn.Linear(in_features=4096, out_features=128, bias=True),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(in_features=128, out_features=1, bias=True)).to('cuda')
emotion_head = nn.Sequential(nn.Linear(in_features=4096, out_features=64, bias=True),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(in_features=64, out_features=num_emotions, bias=True)).to('cuda')
vgg16.classifier[6] = None # remove old fc layer
vgg16.classifier[6] = nn.Sequential(nn.Linear(in_features=4096, out_features=1024, bias=True),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(in_features=1024, out_features=1024, bias=True),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(in_features=1024, out_features=num_classes+num_emotions, bias=True)).to('cuda')
# Initialize optimizers and loss functions for both heads
face_criterion = nn.BCEWithLogitsLoss().to('cuda')
face_optimizer = optim.SGD(params=face_head.parameters(), lr=0.001, momentum=0.9)
emotion_criterion = nn.CrossEntropyLoss().to('cuda')
emotion_optimizer = optim.SGD(params=emotion_head.parameters(), lr=0.001, momentum=0.9)
# Continuously iterate over both heads and update parameters
for epoch in range(20):
# Keep track of losses and accuracies for both heads
face_running_loss = 0.0
face_num_correct = 0
emotion_running_loss = 0.0
emotion_num_correct = 0
# Iterate over data
for i, data in enumerate(dataloader, 0):
inputs, labels = data
inputs = Variable(inputs).to('cuda')
labels = {
'face': Variable(labels['face'].float()).to('cuda'),
'emotion': Variable(labels['emotion']).to('cuda'),
}
# Zero the parameter gradients
face_optimizer.zero_grad()
emotion_optimizer.zero_grad()
# Forward + backward + optimize
vgg16_outputs = vgg16(inputs)
face_logits = face_head(vgg16_outputs)
face_outputs = torch.sigmoid(face_logits)
emotion_logits = emotion_head(vgg16_outputs)
emotion_outputs = nn.functional.softmax(emotion_logits, dim=1)
# Calculate losses and accuracies
face_loss = face_criterion(face_logits, labels['face'])
face_loss.backward()
face_optimizer.step()
_, face_preds = torch.max(face_outputs, 1)
face_num_correct += torch.sum((face_preds == labels['face'].squeeze()))
emotion_loss = emotion_criterion(emotion_logits, labels['emotion'])
emotion_loss.backward()
emotion_optimizer.step()
_, emotion_preds = torch.max(emotion_outputs, 1)
emotion_num_correct += torch.sum((emotion_preds == labels['emotion']))
# Print statistics after each epoch
print('[Epoch %d] Face Loss: %.3f | Acc: %.3f' %(epoch+1, face_running_loss / len(dataset), float(face_num_correct)/(len(dataset)*num_classes)))
print('[Epoch %d] Emotion Loss: %.3f | Acc: %.3f' %(epoch+1, emotion_running_loss / len(dataset), float(emotion_num_correct)/(len(dataset))))
代码解读
以上代码呈现了如何在 PyTorch 中应用多任务学习的方法,具体说明了如何定义多个 DataLoader 以分别训练每个任务的模型。
为了支撑多任务学习的需求,我们在模型中设置了多个输出分支,并对每个分支进行了独特的损失函数和优化器配置。开发相应的代码以整合模型的输出结果,是完成该系统的核心步骤。
