What Is Transfer Learning and Why Should I Care?

阅读量：

作者：禅与计算机程序设计艺术

1.简介

在深度学习领域，迁移学习是一项备受关注的研究主题，它允许我们从一个任务的知识中转移到另一个相关任务，从而显著减少了所需标注数据的数量，同时提升了对这两个任务的泛化性能。然而，这一技术的工作原理及其有效性尚不完全明确。本文旨在探讨迁移学习的基本概念，并阐述其作为构建复杂模型的有效工具的重要性。我们将深入讨论几种迁移学习的方法，包括特征提取、微调和多任务学习，并详细解释它们的运行机制及其优势。此外，我们将探讨在实际应用中可能遇到的挑战，如模型收敛性和过拟合问题，并提出潜在的解决方案，如域适应技术和无监督预训练方法。通过这些讨论，我们希望为读者提供一个全面的理解，从而进一步探索其在各种应用中的潜力和发展方向。

2.基本概念和术语

2.1 机器学习

机器学习（ML）是人工智能领域的一个子领域，它允许计算机无需显式编程即可自主学习。其目标是通过分析历史经验来使机器识别模式并预测新数据。机器学习算法利用统计建模来揭示输入与输出之间的关系。机器学习中常见的三类典型问题包括分类、回归和聚类。分类问题通过分析输入特征预测离散结果，而回归问题则估计连续数值结果。聚类问题则通过分析输入特征将相似样例分组。尽管面临不同的挑战和需求，但机器学习算法在多个行业中展现了巨大的潜力，成功解决了众多实际问题。流行框架包括TensorFlow、PyTorch、Scikit-learn和Keras。

2.2 深度学习

Deep learning (DL) represents a specialized branch of machine learning that employs neural networks to address intricate challenges. Neural networks are structured as layers of interconnected nodes, each of which receives input from other nodes and produces output through weights assigned during the training process. Neural networks are designed to imitate the manner in which human brains operate, consisting of millions of parallel processes working in tandem to execute complex tasks. Deep learning has been successfully applied to a wide array of tasks, spanning from image recognition to natural language processing, speech recognition, and translation. Among the most widely used DL libraries are Tensorflow, Pytorch, and Keras.

2.3 迁移学习

Transfer learning represents a machine learning strategy where a pre-trained model serves as the foundation for developing a target model. The central concept of transfer learning lies in transferring knowledge obtained from addressing one issue to enhance performance on a related yet distinct task. Such as, when aiming to classify images of animals, one might begin with a substantial dataset of dog images and fine-tune a convolutional neural network (CNN) tailored to your specific classification task. Transfer learning proves especially valuable when the two tasks share commonalities, i.e., they are highly alike or exhibit minor distinctions. This characteristic makes transfer learning highly effective for applications involving massive datasets and intricate challenges.

2.4 特征提取、微调、多任务学习

There are three main transfer learning strategies:

Feature Extraction: Instead of directly training the last layer(s) of a model on the target task, we freeze all layers except those that are necessary for the desired output and extract features from the remaining layers. These extracted features could be fed into a new fully connected layer or softmax classifier for prediction.
Fine-Tuning: Whereas conventionally trained CNNs are optimized for accuracy, transfer learning enables us to focus on just a few layers of the network instead. We can unlock these layers by freezing them, retrain only the top layers on our target task, and continue training with a lower learning rate. This strategy can help speed up convergence, reduce overfitting, and enable us to build highly accurate models quickly.
Multi-Task Learning: One issue with traditional CNNs is that they require fixed sized input images and therefore may struggle with handling variations in size and aspect ratio within the same batch. To address this challenge, we can train multiple independent tasks simultaneously using separate datasets. Each task will receive its own fully connected head and learns to optimize itself independently, leading to better generalization performance than single-headed models trained on the entire dataset jointly.

3.核心算法原理和具体操作步骤

In brief, transfer learning represents a potent strategy that integrates the strengths of deep learning and machine learning. By borrowing from prior knowledge, we can minimize the time and resources typically invested in training new models from scratch, instead focusing on enhancing specific components of existing models. Various methods exist to implement transfer learning, each presenting unique benefits and drawbacks. In this article, we will delve into each strategy in detail and illustrate their practical application through Python code snippets.

3.1 特征提取

Feature extraction involves the process of obtaining features from a pre-trained model for application in a new task. Initially, the pre-trained model is loaded, and all layers except those essential for the desired output are fixed. Subsequently, a sample undergoes forward propagation through these fixed layers to generate intermediate representations, commonly referred to as "features." The output layer is subsequently omitted since it is no longer required. These features can now be employed as input to a new fully connected layer or a softmax-based classifier for prediction purposes.

This section presents an implementation of feature extraction, utilizing VGG16 as a pre-trained model within the Keras framework.

复制代码

    from keras.applications import VGG16
    from keras.layers import Dense, Flatten
    
    # Load pre-trained VGG16 model
    vgg_model = VGG16(weights='imagenet', include_top=False, input_shape=(img_width, img_height, 3))
    
    # Freeze layers except for last four layers
    for layer in vgg_model.layers[:-4]:
    layer.trainable = False
    
    # Define custom FC layers
    fc1 = Flatten()(vgg_model.output)
    fc2 = Dense(128, activation='relu')(fc1)
    predictions = Dense(num_classes, activation='softmax')(fc2)
    
    # Create custom model
    custom_model = Model(inputs=vgg_model.input, outputs=predictions)
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

The include_top parameter determines whether to include the default top layers of a pre-trained model. When set to True, the model outputs a classification layer with 1000 neurons corresponding to ImageNet categories. If include_top is set to False, the top layers are omitted, resulting in intermediate representations that are ideal for feature extraction. By setting include_top=False, we can save memory and computation time, as this prevents unnecessary gradient calculations propagated through all previous layers. It's important to note that Keras offers various pre-trained models, so you might need to adjust the architecture slightly depending on your specific model choice.

Upon defining the custom model, we can proceed to compile it using the categorical_crossentropy loss function and a suitable optimizer in a standard manner. Training typically proceeds normally until achieving convergence. Afterward, evaluation of the model on a test set is typically conducted to assess its performance metrics, including accuracy, precision, recall, and F1 score, to determine its effectiveness on the target task.

3.2 微调

Fine-tuning represents a methodology that specifically fine-tunes the upper layers of a pre-trained model for a particular task, while keeping the lower layers intact. This approach helps enhance model convergence and mitigate overfitting. By employing a pre-trained model, we can specify which layers to preserve and which to modify, then retrain the adjusted layers to optimize performance on the target task. The resultant model will benefit from the pre-trained layers’ knowledge, thereby facilitating more efficient achievement of optimal parameter convergence.

This implementation utilizes ResNet50 as a pre-trained model within the Keras framework.

复制代码

    from keras.applications import ResNet50
    from keras.models import Sequential
    from keras.layers import Dense, Flatten
    
    # Load pre-trained ResNet50 model
    resnet_model = ResNet50(weights='imagenet', include_top=False, input_shape=(img_width, img_height, 3))
    
    # Freeze all layers except for the last block of layers
    for layer in resnet_model.layers[:-10]:
    layer.trainable = False
    
    # Add custom top layers for the target task
    custom_model = Sequential()
    custom_model.add(Flatten(input_shape=resnet_model.output_shape[1:]))
    custom_model.add(Dense(128, activation='relu'))
    custom_model.add(Dropout(0.5))
    custom_model.add(Dense(num_classes, activation='softmax'))
    
    # Compile custom model
    custom_model.compile(optimizer='adam', 
                      loss='categorical_crossentropy',
                      metrics=['accuracy'])
    
    # Train custom model
    history = custom_model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs)
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

Specifically, we excluded the original classifier layers of the pre-trained model by setting include_top=False. To begin with, we defined a custom sequential model by first flattening the final representation of the pre-trained model, followed by the incorporation of dense hidden layers and dropout regularization. Finally, we incorporated a softmax output layer to complete the classification process. During compilation, we utilized the Adam optimizer along with the categorical cross-entropy loss function, and incorporated the accuracy metric for evaluation purposes.

After training, the model can be assessed on a test set to determine if the improvement aligns with our expectations.

While fine-tuning cannot ensure enhanced performance across all tasks, particularly when the original pre-trained model was designed for a different objective, practical application often necessitates experimenting with various hyperparameter configurations and architectural setups to identify the optimal solution tailored to each specific task.

3.3 多任务学习

Multi-task learning is a method of training a model on multiple unrelated tasks simultaneously. This approach can significantly improve generalization performance, since each task provides information to the overall model rather than relying mainly on a single source. Each task accesses its own dedicated head of the model and independently optimizes its own performance.

As an example, consider a face emotion detection model that needs to identify facial features and emotional expressions in photographs. Instead of training the model separately on two datasets—one focused on faces and facial expressions and another dedicated to emotional expressions—we can train it on a combined dataset that includes both types of information. During inference, the model will generate probability scores for each task, enabling it to make accurate and reliable conclusions about both the person's facial features and emotional state.

This represents a realization of multi-task learning, achieved through the use of Xception as a pre-trained convolutional neural network architecture in Keras.

复制代码

    from keras.applications import Xception
    from keras.models import Sequential
    from keras.layers import GlobalAveragePooling2D, Dense, Dropout
    
    # Load pre-trained Xception model
    xception_model = Xception(weights='imagenet', include_top=False, input_shape=(img_width, img_height, 3))
    
    # Freeze layers except for last six blocks of layers
    for layer in xception_model.layers[:-27]:
    layer.trainable = False
    
    # Add custom heads for the target tasks
    face_head = Sequential([GlobalAveragePooling2D(),
                        Dense(128, activation='relu'),
                        Dropout(0.5),
                        Dense(1, activation='sigmoid')])
    
    emotion_head = Sequential([GlobalAveragePooling2D(),
                            Dense(64, activation='relu'),
                            Dropout(0.5),
                            Dense(num_emotions, activation='softmax')])
    
    # Combine pre-trained and custom heads into single model
    combined_model = Sequential()
    combined_model.add(xception_model)
    combined_model.add(face_head)
    combined_model.add(emotion_head)
    
    # Compile combined model
    combined_model.compile(optimizer='adam', 
                        loss={'face': 'binary_crossentropy',
                              'emotion':'sparse_categorical_crossentropy'},
                        loss_weights={'face': 0.2,
                                      'emotion': 0.8},
                        metrics={'face': ['accuracy'],
                                 'emotion': ['accuracy']})
    
    # Train combined model
    history = combined_model.fit({'face': X_faces_train,
                               'emotion': X_emotions_train},
                              {'face': y_faces_train,
                               'emotion': y_emotions_train},
                              validation_data=[{'face': X_faces_test,
                                                'emotion': X_emotions_test},
                                               {'face': y_faces_test,
                                                'emotion': y_emotions_test}],
                              epochs=epochs)
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

In this case, we rehydrate the pre-trained Xception model and lock all layers apart from the final six block layers, thereby granting access to the terminal global average pooling layers within each block. We engineer two specialized heads tailored for our two distinct target tasks: one optimized for face detection and another for emotion analysis. These heads are appended to the pre-trained structure as additional layers, subsequently merged into a unified model using the Keras functional API.

We train the model using binary cross-entropy loss for face detection and sparse categorical cross-entropy loss for emotion recognition, assigning weighting coefficients of 0.2 and 0.8 respectively to ensure each attention head's contributions are balanced. We also incorporate the accuracy metric into the evaluation process. Finally, we employ mini-batch gradient descent to train the combined model across both labeled datasets.

During training, we track the progress of both heads via separate evaluation metrics. Upon obtaining convergence, we can evaluate the full model on a test set to assess its overall quality.

Similar to fine-tuning, multi-task learning may not consistently achieve significant improvements due to commonalities between the two tasks. It might require selecting an optimal balance of tasks and hyperparameters to attain good results.

4.具体代码实例及解释说明

In addition to explaining fundamental concepts and technical aspects, we will also demonstrate practical implementations of feature extraction, fine-tuning, and multi-task learning using widely-used deep learning frameworks such as $Keras$ and $PyTorch$ . These demonstrations will highlight the essential syntax and application of each technique, offering valuable guidance for implementing similar approaches in your own projects.

Let's start by importing the required libraries and defining necessary variables.

复制代码

    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn.datasets import fetch_california_housing
    
    # Set random seed for reproducibility
    np.random.seed(42)
    
    # Set number of classes and samples per class
    num_classes = 2
    samples_per_class = 100
    
    # Generate synthetic dataset for demonstration purposes
    X, y = make_blobs(n_samples=num_classes*samples_per_class, centers=num_classes, n_features=2, cluster_std=2, random_state=42)
    y = to_categorical(y)
    shuffle_idx = np.arange(len(y))
    np.random.shuffle(shuffle_idx)
    X, y = X[shuffle_idx], y[shuffle_idx]
    X_train, y_train = X[:900], y[:900]
    X_val, y_val = X[900:], y[900:]
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

This code generates a synthetic dataset of blobs with clustered variance, which are divided by concentric circles. Following this, the dataset is randomly shuffled and divided into a training set (X_train and y_train) and a validation set (X_val and y_val).

Now let's generate plots to visualize our synthetic dataset:

复制代码

    fig, ax = plt.subplots(figsize=(8, 8))
    ax.scatter(X[:, 0], X[:, 1], c=y.argmax(axis=-1), cmap="jet")
    ax.set_xlabel("Feature 1")
    ax.set_ylabel("Feature 2");
    
      
      
      
    
    代码解读

Our dataset presents a compelling outcome. Moving forward, we will focus on feature extraction, fine-tuning, and multi-task learning, employing Keras and PyTorch in sequence.

4.1 Keras 中的特征提取

Keras 提供的最基础的特征提取方案主要依赖于基于 VGG 的预训练模型。在 Keras 库中，我们可以轻松地导入 VGG16 模型，并选择冻结其最后四层权重以防止更新。接着，我们就可以根据需要构建自定义的全连接层来完成分类任务。

复制代码

    from tensorflow.keras.applications import VGG16
    from tensorflow.keras.layers import Input, Flatten, Dense
    
    # Load pre-trained VGG16 model
    base_model = VGG16(weights='imagenet', include_top=False, input_shape=(img_width, img_height, 3))
    
    # Freeze layers except for last four layers
    for layer in base_model.layers[:-4]:
    layer.trainable = False
    
    # Define custom FC layers
    inputs = Input(shape=(img_width, img_height, 3))
    x = base_model(inputs)
    x = Flatten()(x)
    outputs = Dense(num_classes, activation='softmax')(x)
    
    # Create custom model
    model = tf.keras.Model(inputs=inputs, outputs=outputs)
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

基于 TensorFlow 库进行示例，切换起来非常便捷。

我们还可以通过加载预训练的 ResNet50 模型实现同样的功能：

复制代码

    from tensorflow.keras.applications import ResNet50
    from tensorflow.keras.models import Model
    from tensorflow.keras.layers import Input, Flatten, Dense
    
    # Load pre-trained ResNet50 model
    base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(img_width, img_height, 3))
    
    # Freeze all layers except for the last block of layers
    for layer in base_model.layers[:-10]:
    layer.trainable = False
    
    # Add custom top layers for the target task
    inputs = Input(shape=(img_width, img_height, 3))
    x = base_model(inputs)
    x = Flatten()(x)
    outputs = Dense(num_classes, activation='softmax')(x)
    
    # Create custom model
    model = Model(inputs=inputs, outputs=outputs)
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

生成的模型将输出一个 num_classes-维向量，该向量精确地表示输入图像在各个类别上的概率分布。

4.2 PyTorch 中的特征提取

PyTorch集成了两种先进的特征提取方案。其中一种是基于经典的VGG网络构建的预训练模型，另一种则是巧妙地结合了视觉注意力机制。

方法一——基于 VGG 的预训练模型

通过 PyTorch 的 torchvision.models 包中的 vgg16() 函数，我们可以获取基于 VGG 的预训练模型。对于基于 ResNet 的模型，同样地，我们也可以调用对应的函数获取其预训练版本。

复制代码

    import torch
    import torchvision.models as models
    
    # Load pre-trained VGG16 model
    vgg16 = models.vgg16(pretrained=True).to('cuda')
    
    # Freeze layers except for last four layers
    for param in vgg16.parameters():
    param.requires_grad_(False)
    
    # Replace last layer with custom FC layer for classification
    classifier = nn.Sequential(nn.Linear(in_features=4096, out_features=256, bias=True),
                           nn.ReLU(),
                           nn.Dropout(p=0.5),
                           nn.Linear(in_features=256, out_features=num_classes, bias=True)).to('cuda')
    vgg16.classifier[-1] = classifier
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

这里我们用到了 CUDA 库，它可以让我们在 GPU 上运行神经网络模型。

方法二——视觉注意机制

视觉注意力机制（Visual Attention Mechanism）基于强化学习的方法，使智能体能够识别图像中的对象，并根据识别对象的位置和形状生成相应的奖励或惩罚信号。这种机制有助于智能体更准确地理解世界，并更高效地做出决策。

通过PIL库，我们可以采用读取图像文件并对其进行预处理操作的方法。

复制代码

    import PIL.Image
    import torchvision.transforms as transforms
    
    # Read image file
    
    # Preprocess image
    preprocess = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])
    input_tensor = preprocess(image)
    input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model
    
    # Move input and model to GPU for speed if available
    if torch.cuda.is_available():
    input_batch = input_batch.to('cuda')
    vgg16.to('cuda')
    
    with torch.no_grad():
    output = vgg16(input_batch)
    probs = torch.nn.functional.softmax(output[0], dim=0)
    
    # Output top predicted labels and probabilities
    _, indices = probs.sort(dim=0, descending=True)
    percentage = torch.nn.functional.softmax(output[0][indices]).tolist()[::-1][:5]*100
    labels = [imagenet_classes[idx] for idx in indices.tolist()]
    print([(label, prob, percentage_) for label, prob, percentage_ in zip(labels, probs, percentage)])
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

这段代码可以输出识别到的前 5 个物体的标签名称、概率值、以及所占比例。

4.3 Keras 中的微调

这一技术具有重要意义，能够有效促进神经网络模型在新任务中的应用。

为实现猫狗识别任务，我们有一个目标需要进行检测。为此，我们已建立一个基于VGG架构的预训练模型，用于图片分类任务。那么，针对猫狗识别任务，我们只需调整最后几层全连接层的参数配置，即可完成模型的微调。

复制代码

    from tensorflow.keras.optimizers import SGD
    from tensorflow.keras.applications import VGG16
    from tensorflow.keras.layers import Input, Flatten, Dense, Dropout
    
    # Load pre-trained VGG16 model
    base_model = VGG16(weights='imagenet', include_top=False, input_shape=(img_width, img_height, 3))
    
    # Freeze all layers except for the last five blocks
    for layer in base_model.layers[:-5]:
    layer.trainable = False
    
    # Add custom top layers for the target task
    inputs = Input(shape=(img_width, img_height, 3))
    x = base_model(inputs)
    x = Flatten()(x)
    x = Dense(512, activation='relu')(x)
    x = Dropout(0.5)(x)
    outputs = Dense(num_classes, activation='softmax')(x)
    
    # Create custom model
    model = tf.keras.Model(inputs=inputs, outputs=outputs)
    
    # Compile custom model
    model.compile(loss='categorical_crossentropy',
              optimizer=SGD(lr=0.001, momentum=0.9),
              metrics=['accuracy'])
    
    # Continue training custom model on new task
    history = model.fit(X_train, y_train,
                    validation_data=(X_val, y_val),
                    epochs=20)
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

无需更改底层卷积层的参数，这些参数基于ImageNet的预训练特性而无需更改，我们只需在顶部添加几个全连接层进行微调即可。进一步训练模型，以评估其对新增目标任务性能的提升潜力。

4.4 PyTorch 中的微调

与 Keras 中的微调机制相似，PyTorch 通过冻结除最后几层参数外的所有层参数，并在模型顶部添加一组新的全连接层，实现模型的微调。

复制代码

    import torch.optim as optim
    import torch.nn as nn
    import torchvision.models as models
    
    
    # Load pre-trained VGG16 model
    vgg16 = models.vgg16(pretrained=True).to('cuda')
    
    # Freeze layers except for last four layers
    for param in vgg16.parameters():
    param.requires_grad_(False)
    
    # Replace last layer with custom FC layer for classification
    classifier = nn.Sequential(nn.Linear(in_features=4096, out_features=256, bias=True),
                           nn.ReLU(),
                           nn.Dropout(p=0.5),
                           nn.Linear(in_features=256, out_features=num_classes, bias=True)).to('cuda')
    vgg16.classifier[-1] = classifier
    
    criterion = nn.CrossEntropyLoss().to('cuda')
    optimizer = optim.SGD(params=vgg16.parameters(), lr=0.001, momentum=0.9)
    
    # Continue training custom model on new task
    for epoch in range(20):
    running_loss = 0.0
    num_correct = 0
    
    # Iterate over data
    for i, data in enumerate(dataloader, 0):
        inputs, labels = data
    
        inputs = Variable(inputs).to('cuda')
        labels = Variable(labels).to('cuda')
    
        # Zero the parameter gradients
        optimizer.zero_grad()
    
        # Forward + backward + optimize
        outputs = vgg16(inputs)
        _, preds = torch.max(outputs, 1)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
    
        # Statistics
        running_loss += loss.item() * inputs.size(0)
        num_correct += torch.sum(preds == labels.data)
    
    print('[Epoch %d] Loss: %.3f | Acc: %.3f' %(epoch+1, running_loss / len(dataset), float(num_correct)/len(dataset)))
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

在Keras与PyTorch之间存在显著差异。在Keras中，损失值的计算由库自动处理，而在PyTorch中则需要手动计算损失值。因此，我们需要定义损失函数和优化器，并且在迭代数据集时更新模型参数。

4.5 Keras 中的多任务学习

多任务学习（multi-task learning）作为机器学习的一种技术，在实际应用中展现出显著的潜力。该技术允许模型同时处理既相互关联又高度相关的任务。例如，一个模型可以同时识别物体及其对应的动作。

在Keras中，最直接的方案就是分别单独训练两个不同任务的顶层网络，然后将这两个网络进行整合。

复制代码

    from tensorflow.keras.layers import GlobalAveragePooling2D, Concatenate, Dense, Dropout
    from tensorflow.keras.models import Model
    from tensorflow.keras.optimizers import Adam
    
    # Load pre-trained VGG16 model
    base_model = VGG16(weights='imagenet', include_top=False, input_shape=(img_width, img_height, 3))
    
    # Freeze layers except for last four layers
    for layer in base_model.layers[:-4]:
    layer.trainable = False
    
    # Add custom heads for the target tasks
    face_head = Sequential([GlobalAveragePooling2D(),
                        Dense(128, activation='relu'),
                        Dropout(0.5),
                        Dense(1, activation='sigmoid')])
    
    emotion_head = Sequential([GlobalAveragePooling2D(),
                            Dense(64, activation='relu'),
                            Dropout(0.5),
                            Dense(num_emotions, activation='softmax')])
    
    # Combine pre-trained and custom heads into single model
    combined_model = Sequential()
    combined_model.add(base_model)
    combined_model.add(Concatenate())
    combined_model.add(Dense(1024, activation='relu'))
    combined_model.add(Dropout(0.5))
    combined_model.add(Dense(num_classes, activation='softmax'))
    
    # Compile combined model
    combined_model.compile(loss='categorical_crossentropy',
                       optimizer=Adam(lr=0.001),
                       metrics=['accuracy'])
    
    # Train combined model
    history = combined_model.fit(X_train, {'face': y_faces_train, 'emotion': y_emotions_train},
                              validation_data=(X_val, {'face': y_faces_val, 'emotion': y_emotions_val}),
                              epochs=20)
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

基于此，我们仅需两行代码即可实现模型构建工作。这样一来，模型将同时聚焦于人脸识别和表情识别两个任务。采用 fit() 方法，我们可以对这个模型进行训练。

4.6 PyTorch 中的多任务学习

与 Keras 中的多任务学习相似，PyTorch 也包含多种多任务学习的方法。然而，PyTorch 没有内置的支持功能，因此需要我们自己编写代码来实现这一功能。

在处理过程中，我们需要将两个不同任务的数据划分成不同的DataLoader，然后分别训练两个模型。在模型设计过程中，我们需要综合考虑以下因素：

共享的基网络；
每个任务的输出大小；
如何组合输出结果。

复制代码

    import torch.optim as optim
    import torch.nn as nn
    import torchvision.models as models
    
    # Load pre-trained VGG16 model
    vgg16 = models.vgg16(pretrained=True).to('cuda')
    
    # Freeze layers except for last four layers
    for param in vgg16.parameters():
    param.requires_grad_(False)
    
    # Replace last layer with custom heads for both tasks
    face_head = nn.Sequential(nn.Linear(in_features=4096, out_features=128, bias=True),
                          nn.ReLU(),
                          nn.Dropout(p=0.5),
                          nn.Linear(in_features=128, out_features=1, bias=True)).to('cuda')
    
    emotion_head = nn.Sequential(nn.Linear(in_features=4096, out_features=64, bias=True),
                              nn.ReLU(),
                              nn.Dropout(p=0.5),
                              nn.Linear(in_features=64, out_features=num_emotions, bias=True)).to('cuda')
    
    vgg16.classifier[6] = None # remove old fc layer
    vgg16.classifier[6] = nn.Sequential(nn.Linear(in_features=4096, out_features=1024, bias=True),
                                    nn.ReLU(),
                                    nn.Dropout(p=0.5),
                                    nn.Linear(in_features=1024, out_features=1024, bias=True),
                                    nn.ReLU(),
                                    nn.Dropout(p=0.5),
                                    nn.Linear(in_features=1024, out_features=num_classes+num_emotions, bias=True)).to('cuda')
    
    # Initialize optimizers and loss functions for both heads
    face_criterion = nn.BCEWithLogitsLoss().to('cuda')
    face_optimizer = optim.SGD(params=face_head.parameters(), lr=0.001, momentum=0.9)
    
    emotion_criterion = nn.CrossEntropyLoss().to('cuda')
    emotion_optimizer = optim.SGD(params=emotion_head.parameters(), lr=0.001, momentum=0.9)
    
    # Continuously iterate over both heads and update parameters
    for epoch in range(20):
    
    # Keep track of losses and accuracies for both heads
    face_running_loss = 0.0
    face_num_correct = 0
    emotion_running_loss = 0.0
    emotion_num_correct = 0
    
    # Iterate over data
    for i, data in enumerate(dataloader, 0):
        inputs, labels = data
    
        inputs = Variable(inputs).to('cuda')
        labels = {
            'face': Variable(labels['face'].float()).to('cuda'),
            'emotion': Variable(labels['emotion']).to('cuda'),
        }
    
        # Zero the parameter gradients
        face_optimizer.zero_grad()
        emotion_optimizer.zero_grad()
    
        # Forward + backward + optimize
        vgg16_outputs = vgg16(inputs)
        face_logits = face_head(vgg16_outputs)
        face_outputs = torch.sigmoid(face_logits)
        emotion_logits = emotion_head(vgg16_outputs)
        emotion_outputs = nn.functional.softmax(emotion_logits, dim=1)
    
        # Calculate losses and accuracies
        face_loss = face_criterion(face_logits, labels['face'])
        face_loss.backward()
        face_optimizer.step()
    
        _, face_preds = torch.max(face_outputs, 1)
        face_num_correct += torch.sum((face_preds == labels['face'].squeeze()))
    
        emotion_loss = emotion_criterion(emotion_logits, labels['emotion'])
        emotion_loss.backward()
        emotion_optimizer.step()
    
        _, emotion_preds = torch.max(emotion_outputs, 1)
        emotion_num_correct += torch.sum((emotion_preds == labels['emotion']))
    
    # Print statistics after each epoch
    print('[Epoch %d] Face Loss: %.3f | Acc: %.3f' %(epoch+1, face_running_loss / len(dataset), float(face_num_correct)/(len(dataset)*num_classes)))
    print('[Epoch %d] Emotion Loss: %.3f | Acc: %.3f' %(epoch+1, emotion_running_loss / len(dataset), float(emotion_num_correct)/(len(dataset))))
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

以上代码呈现了如何在 PyTorch 中应用多任务学习的方法，具体说明了如何定义多个 DataLoader 以分别训练每个任务的模型。

为了支撑多任务学习的需求，我们在模型中设置了多个输出分支，并对每个分支进行了独特的损失函数和优化器配置。开发相应的代码以整合模型的输出结果，是完成该系统的核心步骤。

全部评论 (0)

还没有任何评论哟~

What Is Transfer Learning and Why Should I Care?

作者：禅与计算机程序设计艺术 1.简介 Transferlearningisapopularresearchtopicindeeplearningthatenablesthetransferofkno...

An Overview of Chatbots and Why You Should Care About

作者：禅与计算机程序设计艺术 1.简介 Chatbot（中文叫聊天机器人）已经成为新型工作方式的一部分。聊天机器人可以代替人类完成很多重复性的工作。虽然聊天机器人的出现初期受到社会上部分群体的抵制，但...

What Is ChatGPT Doing … and Why Does It Work?

ChatGPT在做什么……以及它为什么起作用？目录 ChatGPT在做什么……以及它为什么起作用？它只是一次添加一个词概率从何而来？什么是模型？类人任务模型神经网络机器学习和神经网络的训...

Hype Beats Truth: Why You Should Care About VMware NSXT

作者：禅与计算机程序设计艺术 1.简介 2020年，随着“云计算”的火爆，越来越多的人们开始意识到数据中心的虚拟化、网络虚拟化等技术的重要性，无论是运营商还是企业客户都在逐渐采用各种方案实现自己的IT...

What Is Distribution Switch and Why Do We Need It?

Weknowthatindatacentersathreelayerhierarchicalmodelcontainscorelayer,aggregation/distributionlayeran...

5. Moving From ESP8266 to ESP32? Why You Should Care ab

作者：禅与计算机程序设计艺术 1.简介 2017年3月，EspressifSystems推出了ESP32微控制器，据说其性能已达到800MHz，单核速率可达160MIPS，双核速率可达320MIPS，...

详细解读Completely Heterogeneous Transfer Learning with Attention - What And What Not To Transfer

这篇论文叫做CompletelyHeterogeneousTransferLearningwithAttentionWhatAndWhatNotToTransfer，论文的作者来自CarnegieMe...

What is Rust? Why Rust？

whyRust？目前，Rust变得越来越流行。然而，仍然有很多人（和公司！）误解了Rust的主张价值是什么，甚至误解了它是什么。在本文中，我们将讨论Rust是什么以及为什么它是一种可以增强您的技能的...

Why should I avoid multiple inheritance?

题意：为什么我应该避免多重继承？问题背景： IsitagoodconcepttousemultipleinheritanceorcanIdootherthingsinstead? 使用多重继承是个好...

Why AI is the Future of Healthcare and What it Means fo

作者：禅与计算机程序设计艺术 1.简介现代医疗保健领域最主要的任务之一就是从患者身上发现并诊断疾病，并将治疗方案应用到患者身上以取得好的疾病康复或减轻症状。这个过程通常需要由一个精心设计的人工系统来...

是否确定退出登录?

What Is Transfer Learning and Why Should I Care?

1.简介

2.基本概念和术语

2.1 机器学习

2.2 深度学习

2.3 迁移学习

2.4 特征提取、微调、多任务学习

3.核心算法原理和具体操作步骤

3.1 特征提取

3.2 微调

3.3 多任务学习

4.具体代码实例及解释说明

4.1 Keras 中的特征提取

4.2 PyTorch 中的特征提取

方法一——基于 VGG 的预训练模型

方法二——视觉注意机制

4.3 Keras 中的微调

4.4 PyTorch 中的微调

4.5 Keras 中的多任务学习

4.6 PyTorch 中的多任务学习

全部评论 (0)

相关文章推荐

What Is Transfer Learning and Why Should I Care?

An Overview of Chatbots and Why You Should Care About

What Is ChatGPT Doing … and Why Does It Work?

Hype Beats Truth: Why You Should Care About VMware NSXT

What Is Distribution Switch and Why Do We Need It?

5. Moving From ESP8266 to ESP32? Why You Should Care ab

详细解读Completely Heterogeneous Transfer Learning with Attention - What And What Not To Transfer

What is Rust? Why Rust？

Why should I avoid multiple inheritance?

Why AI is the Future of Healthcare and What it Means fo