Colab/PyTorch - 004 Torchvision Semantic Segmentation

阅读量：

Colab/PyTorch - 004 Torchvision Semantic Segmentation

1. 源由
2. 语义分割 - 应用
- 2.1 自动驾驶
- 2.2 面部分割
- 2.3 室内物体分割
- 2.4 地理遥感
3. Semantic segmentation task using torchvision library
- 3.1 FCN architecture implemented with ResNet-101 for semantic segmentation
- Subtask: Model loading and initialization
  - Step: Load model weights and configuration parameters from specified paths or URLs.
  - Step: Load the input image tensor in the required format.
  - Step: Preprocess the image tensor to match the expected input format of the network.
  - Step: Execute the forward pass through the network.
  - Step: Decode the network output to obtain meaningful segmentation results.
  - Step: Generate and store final segmentation output, including masks and associated information.
  - 3.2 使用 DeepLab 语义分割
  - 3.3 多物体语义分割
- 4. 总结
- - 4.1 推理时间
  - 4.2 模型大小
- 5. 参考资料

1. 源由

循序渐进的原则：

逻辑上模型是否经过合理性的验证？
实测数据是否适合进行线性回归分析？
实测数据能否适用于非线性回归建模？
实测数据能否支持多因素（图像）分类问题的研究？
模型参数设置是否有助于实现多因素（图像）分类系统的收敛？
是否能够通过实测确定关键参数范围？
是否能够实现多因素（图像）参数的有效筛选？
是否能够建立多因素（图像）分类模型并完成目标识别？
是否能够建立多因素（图像）目标关联关系模型？
实测结果能否揭示多因素（图像）目标场景间的相互作用关系？

完成物类辨识后,随后便开始轮廓勾勒的过程.从而能够在统一视图中清晰地区分各类目标,并非易事.当然,在静态图片中进行关联分析与场景解析是一项具有挑战性的任务;例如,在B站视频《https://www.bilibili.com/video/BV1WD421J7B5/》中深入探讨了这一问题

对以下图像进行语义分割，轮廓勾画：

通过分析以下图像，请您注意其中的细节部分。在这一过程中，请您观察到每个像素都被分配到与其所属的类别相对应的位置上。举个例子来说，在这张图片中可以区分出人、自行车以及背景区域。换句话说，请您明白这一过程的核心在于识别并区分图片中的各个物体，并对其进行明确标注。

2. 语义分割 - 应用

2.1 自动驾驶

在无人驾驶技术中,自动驾驶电脑需要对道路前方的环境具有良好的感知能力.识别并分离出汽车.行人.车道线以及交通标志等物体至关重要.

2.2 面部分割

面部分割用于将脸部的各个部位分为在几何意义上相似的区域——如嘴唇、眼睛等。这在现实中多个领域都有广泛的应用。一个特别有趣的效果可以在虚拟化妆中实现。

2.3 室内物体分割

增强现实技术与虚拟现实技术广泛应用于各种领域。增强现实系统能够将室内空间划分为多个区域，并识别出椅子、桌子、人、墙壁和其他物体。通过这种划分与识别过程, 便能确定这些物品的具体位置, 并允许我们轻松地摆放和处理这些虚拟对象。

2.4 地理遥感

地理遥感技术对卫星图像的每个像素进行分类处理是一种手段, 以便我们能够追踪各个区域的土地覆盖情况. 当某些地区发生严重的森林砍伐时, 我们可以采取相应的补救措施. 通过利用卫星图像进行语义分割技术的应用前景广阔.

3. 语义分割 - torchvision

本研究旨在探讨两种基于深度学习技术实现的语义分割算法——全卷积网络（FCN）及其改进版本DeepLab v3。这些算法已在COCO 2017训练数据集中的一部分进行了训练，并已取得显著效果。这些算法已在COCO 2017训练数据集中的一部分进行了训练，并已取得显著效果。这些算法已在COCO 2017训练数据集中的一部分进行了训练，并已取得显著效果。

在我们开始之前，让我们了解一下模型的输入和输出。

这些模型接受一个RGB三通道图像输入，并采用了ImageNet预训练统计参数作为归一化方法对图像进行标准化处理。具体而言，在三个颜色通道上分别设置了mean = [0.485, 0.456, 0.406] 和std = [0.229, 0.224, 0.225] 的归一化参数设置。

因此，输入维度是[Ni x Ci x Hi x Wi]
其中，

复制代码

    Ni -> 批量大小
    Ci -> 通道数（即3）
    Hi -> 图像的高度
    Wi -> 图像的宽度

模型的输出维度是[No x Co x Ho x Wo]
其中，

复制代码

    No -> 批量大小（与Ni相同）
    Co -> 数据集的类别数
    Ho -> 图像的高度（在几乎所有情况下与Hi相同）
    Wo -> 图像的宽度（在几乎所有情况下与Wi相同）

请确保代码块正确无误！

请注意以下内容

请确保代码块正确无误！

3.1 FCN 使用 ResNet-101 语义分割

3.1.1 加载模型

当前系统中部署了一个基于Resnet101主干构建的FCN预训练模型实例。当缓存中未包含该模型时，在设置pretrained=True标志后将执行下载操作。通过.eval方法将采用推理模式进行参数加载。

复制代码

    from torchvision import models
    fcn = models.segmentation.fcn_resnet101(pretrained=True).eval()
    
    
    python

复制代码

    /usr/local/lib/python3.10/dist-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
      warnings.warn(
    /usr/local/lib/python3.10/dist-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=FCN_ResNet101_Weights.COCO_WITH_VOC_LABELS_V1`. You can also use `weights=FCN_ResNet101_Weights.DEFAULT` to get the most up-to-date weights.
      warnings.warn(msg)
    Downloading: "https://download.pytorch.org/models/fcn_resnet101_coco-7ecb50ca.pth" to /root/.cache/torch/hub/checkpoints/fcn_resnet101_coco-7ecb50ca.pth
    100%|██████████| 208M/208M [00:03<00:00, 57.5MB/s]

3.1.2 加载图像

请获取一张图片！从指定 URL 中下载并保存了一张鸟的照片。就像你将在代码中看到的那样，请通过 PIL 模块加载图片。

复制代码

    from PIL import Image
    import matplotlib.pyplot as plt
    import torch
    
    !wget -nv https://static.independent.co.uk/s3fs-public/thumbnails/image/2018/04/10/19/pinyon-jay-bird.jpg -O bird.png
    img = Image.open('./bird.png')
    plt.imshow(img); plt.show()
    
    
    python

复制代码

    2024-05-15 02:18:38 URL:https://static.independent.co.uk/s3fs-public/thumbnails/image/2018/04/10/19/pinyon-jay-bird.jpg [182904/182904] -> "bird.png" [1]

3.1.3 预处理图像

为了将图像转换为模型推断所需的标准格式！在该预处理阶段中，请问您是否指的是具体的步骤？具体包括哪些内容？

调整图像尺寸至 (256x256) 大小
对其进行中心裁剪至 (224x224) 大小
转换成PyTorch tensor —— 首先会对图像中的像素值进行归一化处理（normalization），使其范围从 [0, 1] 扩展到 [0, 1]（即保持不变）
应用预定义的均值与标准差标准化（standardization），其中 mean = [0.485, 0.456, 0.406] ， std = [0.178, 0.177, 0.178]

最后一步中, 我们对图像维度进行扩展操作, 并使其从 $[C \times H \times W]$ 的形状变为 $[1 \times C \times H \times W]$ 的形状. 这是必要的安排, 在经过网络处理时需要确保每次处理一批数据.

复制代码

    # Apply the transformations needed
    import torchvision.transforms as T
    trf = T.Compose([T.Resize(256),
                 T.CenterCrop(224),
                 T.ToTensor(),
                 T.Normalize(mean = [0.485, 0.456, 0.406],
                             std = [0.229, 0.224, 0.225])])
    inp = trf(img).unsqueeze(0)
    
    
    python

Torchvision提供了许多有用的功能模块，在图像预处理方面尤其突出。其中一个典型示例是 Transforms 类别下的功能集合，在实际应用中非常常见。Compose 函数是一个关键工具，在这个框架中我们可以通过调用 Compose 函数来创建自定义数据增强 pipeline。这个对象能够处理图像批次，并应用所有预定义的转换操作以达到预期效果

让我们看一下应用在图像上的变换：

使用 $T.Resize(256)$ ：将图像调整至 $256 \times 256$ 像素大小。
应用 $T.CenterCrop(224)$ ：对图像执行中心裁剪操作以获得 $224 \times 224$ 的结果。
转换为张量类型：通过 $T.ToTensor()$ 将图像转换为PyTorch中的张量类型，并将其像素值缩放到[0,1]范围内。
基于均值与标准差归一化处理：通过 $T.Normalize(mean, std)$ 对图像像素进行归一化处理。

3.1.4 网络的前向传播

我们现在拥有了一个经过前处理和充分准备的图像。为了以获取输出键作为目标，请让我们利用该模型对其进行处理。

如前所述，在生成过程中得到的结果属于一种 OrderedDict 类型。为了从其中获取出键以得到最终的输出结果。

复制代码

    # Pass the input through the net
    out = fcn(inp)['out']
    print (out.shape)
    
    
    python

复制代码

    torch.Size([1, 21, 224, 224])

因此, output 是模型的核心输出结果. 它们的维度结构为 [1×21×H×W]. 模型在训练时基于 21 个分类类别, 输出共有 21 个通道.

我们现在需要完成的任务是将该输出经过处理生成一个二维图像或者单通道图像，在生成的过程中每一个像素点都代表了一个类别！

该二维数据集（尺寸为[H×W]）通过映射将每一个像素与相应的类别标签关联起来。值得注意的是，在该二维数据集中，每一个坐标点(x,y)都被赋予了一个数值范围在0至20之间的值。

现在的问题是，我们如何从当前具有维度 [1 x 21 x H x W] 的图像得到这个？

很简单！我们对每个像素位置取最大索引，它代表了该类别。

复制代码

    import numpy as np 
    om = torch.argmax(out.squeeze(), dim=0).detach().cpu().numpy() 
    print(om.shape)
    print(np.unique(om))
    
    
    python

复制代码

    (224, 224)
    [0 3]

经过处理后, 现在我们得到了一张二维图像. 在该二维图像中, 每个像素代表了一个类别. 我们的最终目标是将此二维图像转化为分割图. 在此过程中, 每个分类标签都会被赋予相应的 RGB 颜色值.

3.1.5 解码输出

我们采用以下函数生成一个RGB图像自该2D图像，在此过程中每个标签都会对应其特定的颜色。

复制代码

    # Define the helper function
    def decode_segmap(image, nc=21):
    
      label_colors = np.array([(0, 0, 0),  # 0=background
               # 1=aeroplane, 2=bicycle, 3=bird, 4=boat, 5=bottle
               (128, 0, 0), (0, 128, 0), (128, 128, 0), (0, 0, 128), (128, 0, 128),
               # 6=bus, 7=car, 8=cat, 9=chair, 10=cow
               (0, 128, 128), (128, 128, 128), (64, 0, 0), (192, 0, 0), (64, 128, 0),
               # 11=dining table, 12=dog, 13=horse, 14=motorbike, 15=person
               (192, 128, 0), (64, 0, 128), (192, 0, 128), (64, 128, 128), (192, 128, 128),
               # 16=potted plant, 17=sheep, 18=sofa, 19=train, 20=tv/monitor
               (0, 64, 0), (128, 64, 0), (0, 192, 0), (128, 192, 0), (0, 64, 128)])
    
      r = np.zeros_like(image).astype(np.uint8)
      g = np.zeros_like(image).astype(np.uint8)
      b = np.zeros_like(image).astype(np.uint8)
    
      for l in range(0, nc):
    idx = image == l
    r[idx] = label_colors[l, 0]
    g[idx] = label_colors[l, 1]
    b[idx] = label_colors[l, 2]
    
      rgb = np.stack([r, g, b], axis=2)
      return rgb
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-19/ujlErQJ689yYBNSoGb0vViUp2A13.png)

让我们来看看这个函数的内部操作！

具体改写说明

目前来说，在现有基础上生成 RGB 图像是合理的方案。为此我们需要建立三个 RGB 颜色通道的空二维矩阵

其中 red, green, and blue 代表最终图像 RGB 通道的数据数组。每个数据数组具有[H×W]维度结构（其与二维图像的一致）。

目前, 我们依次访问我们在 label_colors 中存储的所有类别颜色, 并在图像中查找与特定类标签相关联的索引信息. 接着, 在每个通道中, 我们将对应的颜色应用到包含该类标签的像素位置.

最后，我们将三个单独的通道堆叠在一起，形成一个 RGB 图像。

现在，让我们使用这个函数来查看最终的分割输出！

复制代码

    rgb = decode_segmap(om)
    plt.imshow(rgb); plt.show()
    
    
    python

好了！我们已经对图像的输出进行了分割。

那就是那只鸟！

注意：所得的分割图像相较于原始图像尺寸较小，在预处理过程中对该图像进行了大小调整以及裁剪操作。

3.1.6 最终结果

随后，请让我们将所有这些信息整合到一个统一的功能模块中，并且同时尝试引入更多图像元素。

复制代码

    def segment(net, path, show_orig=True, dev='cuda'):
      img = Image.open(path)
      if show_orig: plt.imshow(img); plt.axis('off'); plt.show()
      # Comment the Resize and CenterCrop for better inference results
      trf = T.Compose([T.Resize(640),
                   #T.CenterCrop(224),
                   T.ToTensor(),
                   T.Normalize(mean = [0.485, 0.456, 0.406],
                               std = [0.229, 0.224, 0.225])])
      inp = trf(img).unsqueeze(0).to(dev)
      out = net.to(dev)(inp)['out']
      om = torch.argmax(out.squeeze(), dim=0).detach().cpu().numpy()
      rgb = decode_segmap(om)
      plt.imshow(rgb); plt.axis('off'); plt.show()
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-19/FYv0ebQ62Dyq5plW4AUBdo3EwXJ9.png)

复制代码

    !wget -nv https://www.learnopencv.com/wp-content/uploads/2021/01/horse-segmentation.jpeg -O horse.png
    segment(fcn, './horse.png')
    
    
    python

复制代码

    2024-05-15 02:18:46 URL:http://learnopencv.com/wp-content/uploads/2021/01/horse-segmentation.jpeg [128686/128686] -> "horse.png" [1]

3.2 使用 DeepLab 语义分割

DeepLab 是基于 Google Brain 的一种语义分割架构，请了解其应用方法是什么？

复制代码

    dlab = models.segmentation.deeplabv3_resnet101(pretrained=1).eval()
    
    
    python

复制代码

    /usr/local/lib/python3.10/dist-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=DeepLabV3_ResNet101_Weights.COCO_WITH_VOC_LABELS_V1`. You can also use `weights=DeepLabV3_ResNet101_Weights.DEFAULT` to get the most up-to-date weights.
      warnings.warn(msg)
    Downloading: "https://download.pytorch.org/models/deeplabv3_resnet101_coco-586e9e4e.pth" to /root/.cache/torch/hub/checkpoints/deeplabv3_resnet101_coco-586e9e4e.pth
    100%|██████████| 233M/233M [00:01<00:00, 154MB/s]

让我们深入研究该模型在单幅图像语义分割中的应用！我们计划调用我们在上文中定义的一致函数。

复制代码

    segment(dlab, './horse.png')
    
    
    python

3.3 多物体语义分割

当我们采用具有多个对象的更复杂图像时, 我们将能够观察到这两个模型所得结果之间的某些明显区别。

让我们来试试吧！

复制代码

    !wget -nv "https://www.learnopencv.com/wp-content/uploads/2021/01/person-segmentation.jpeg" -O person.png
    img = Image.open('./person.png')
    plt.imshow(img); plt.show()
    
    print ('Segmenatation Image on FCN')
    segment(fcn, path='./person.png', show_orig=False)
    
    print ('Segmenatation Image on DeepLabv3')
    segment(dlab, path='./person.png', show_orig=False)
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-19/1546wzXKO9ocGUMRr2Z3E8hSlipH.png)

Segmenatation Image on FCN

Segmenatation Image on DeepLabv3

好的！现在你可以看到模型之间的差异了吧？

观察结果表明，在捕捉牛腿的连续性这一方面上，FCN的表现并不理想；相比之下，DeepLabv3成功地捕获到了这一特征。

此外，在深入观察牛身上的人手时，则会发现FCN模型表现尚可,相对而言不够完善,但仍算不错,而DeepLabv3模型也有所捕捉,但略显不足.

这些是肉眼可见的一些模型差异！

注意：如前所述，在预处理阶段对图片进行过缩放与裁剪操作后生成的输出图片尺寸相较于原始图片较小。

尽情尝试一些更多的图像，看看这些模型在不同情况下的表现吧！

4. 总结

到目前为止，已经看到了代码的工作原理以及输出在质量上的表现。

我们将讨论模型的定量方面的特性：

复制代码

    在CPU和GPU上的推断时间
    模型的大小。
    推断时使用的GPU内存。

实验代码：004 Torchvision中的语义分割技术

4.1 推理时间

复制代码

    import time
    
    def infer_time(net, path='./horse.png', dev='cuda'):
      img = Image.open(path)
      trf = T.Compose([T.Resize(256),
                   T.CenterCrop(224),
                   T.ToTensor(),
                   T.Normalize(mean = [0.485, 0.456, 0.406],
                               std = [0.229, 0.224, 0.225])])
    
      inp = trf(img).unsqueeze(0).to(dev)
    
      st = time.time()
      out1 = net.to(dev)(inp)
      et = time.time()
    
      return et - st
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-19/KgxE41ipB7SJUcOoek23ajfd6ATI.png)

复制代码

    avg_over = 100
    
    fcn_infer_time_list_cpu = [infer_time(fcn, dev='cpu') for _ in range(avg_over)]
    fcn_infer_time_avg_cpu = sum(fcn_infer_time_list_cpu) / avg_over
    
    dlab_infer_time_list_cpu = [infer_time(dlab, dev='cpu') for _ in range(avg_over)]
    dlab_infer_time_avg_cpu = sum(dlab_infer_time_list_cpu) / avg_over
    
    
    print ('Inference time for first few calls for FCN      : {}'.format(fcn_infer_time_list_cpu[:10]))
    print ('Inference time for first few calls for DeepLabv3: {}'.format(dlab_infer_time_list_cpu[:10]))
    
    print ('The Average Inference time on FCN is:     {:.2f}s'.format(fcn_infer_time_avg_cpu))
    print ('The Average Inference time on DeepLab is: {:.2f}s'.format(dlab_infer_time_avg_cpu))
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-19/daBFhSCInD8e9ctRfrLmZX5Q6HUl.png)

On CPU

复制代码

    Inference time for first few calls for FCN      : [1.0962293148040771, 1.41335129737854, 1.4978973865509033, 1.1366255283355713, 0.994088888168335, 1.0438227653503418, 1.0235023498535156, 1.0187690258026123, 1.045358657836914, 1.0086448192596436]
    Inference time for first few calls for DeepLabv3: [1.2838966846466064, 1.1889286041259766, 1.4854016304016113, 1.5590059757232666, 1.4725146293640137, 1.1244769096374512, 1.0172410011291504, 0.9699416160583496, 0.9643347263336182, 0.909754753112793]
    The Average Inference time on FCN is:     1.11s
    The Average Inference time on DeepLab is: 1.04s

On GPU

复制代码

    avg_over = 100
    
    fcn_infer_time_list_gpu = [infer_time(fcn) for _ in range(avg_over)]
    fcn_infer_time_avg_gpu = sum(fcn_infer_time_list_gpu) / avg_over
    
    dlab_infer_time_list_gpu = [infer_time(dlab) for _ in range(avg_over)]
    dlab_infer_time_avg_gpu = sum(dlab_infer_time_list_gpu) / avg_over
    
    print ('Inference time for first few calls for FCN      : {}'.format(fcn_infer_time_list_gpu[:10]))
    print ('Inference time for first few calls for DeepLabv3: {}'.format(dlab_infer_time_list_gpu[:10]))
    
    print ('The Average Inference time on FCN is:     {:.3f}s'.format(fcn_infer_time_avg_gpu))
    print ('The Average Inference time on DeepLab is: {:.3f}s'.format(dlab_infer_time_avg_gpu))
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-19/qhOAzb15oHw46QN7i9fGgkTuZYCd.png)

复制代码

    Inference time for first few calls for FCN      : [0.16111516952514648, 0.018294334411621094, 0.01825737953186035, 0.01937079429626465, 0.018789291381835938, 0.01917719841003418, 0.018662691116333008, 0.01843094825744629, 0.018068552017211914, 0.018398046493530273]
    Inference time for first few calls for DeepLabv3: [0.11033177375793457, 0.020525217056274414, 0.020010948181152344, 0.020022869110107422, 0.0229949951171875, 0.022975921630859375, 0.01997232437133789, 0.02288365364074707, 0.02026224136352539, 0.019922971725463867]
    The Average Inference time on FCN is:     0.025s
    The Average Inference time on DeepLab is: 0.022s

复制代码

    plt.bar([0.1, 0.2], [fcn_infer_time_avg_cpu, dlab_infer_time_avg_cpu], width=0.08)
    plt.ylabel('Time taken in Seconds')
    plt.xticks([0.1, 0.2], ['FCN', 'DeepLabv3'])
    plt.title('Inference time of FCN and DeepLabv3 on CPU')
    plt.show()
    
    
    python

复制代码

    plt.bar([0.1, 0.2], [fcn_infer_time_avg_gpu, dlab_infer_time_avg_gpu], width=0.08)
    plt.ylabel('Time taken in Seconds')
    plt.xticks([0.1, 0.2], ['FCN', 'DeepLabv3'])
    plt.title('Inference time of FCN and DeepLabv3 on GPU')
    plt.show()
    
    
    python

我们可以看到 DeepLab 模型比 FCN 更快一些。

4.2 模型大小

复制代码

    import os
    
    #/root/.cache/torch/hub/checkpoints/fcn_resnet101_coco-7ecb50ca.pth
    #/root/.cache/torch/hub/checkpoints/deeplabv3_resnet101_coco-586e9e4e.pth
    #resnet101_size = os.path.getsize('/root/.cache/torch/hub/checkpoints/resnet101-5d3b4d8f.pth')
    
    fcn_size = os.path.getsize('/root/.cache/torch/hub/checkpoints/fcn_resnet101_coco-7ecb50ca.pth')
    dlab_size = os.path.getsize('/root/.cache/torch/hub/checkpoints/deeplabv3_resnet101_coco-586e9e4e.pth')
    
    fcn_total = fcn_size # + resnet101_size
    dlab_total = dlab_size # + resnet101_size
    
    print ('Size of the FCN model with Resnet101 backbone is:       {:.2f} MB'.format(fcn_total /  (1024 * 1024)))
    print ('Size of the DeepLabv3 model with Resnet101 backbone is: {:.2f} MB'.format(dlab_total / (1024 * 1024)))
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-19/VSI0ac4d3bOM8xFNgPTH7fzniGw1.png)

复制代码

    Size of the FCN model with Resnet101 backbone is:       207.71 MB
    Size of the DeepLabv3 model with Resnet101 backbone is: 233.22 MB

复制代码

    plt.bar([0, 1], [fcn_total / (1024 * 1024), dlab_total / (1024 * 1024)])
    plt.ylabel('Size of the model in MegaBytes')
    plt.xticks([0, 1], ['FCN', 'DeepLabv3'])
    plt.title('Comparison of the model size of FCN and DeepLabv3')
    plt.show()
    
    
    python

模型的规模指的是其参数文件所占据的空间大小。相较于FCN架构而言, DeepLab架构仅在参数规模上稍有优势.

5. 参考资料

全部评论 (0)

还没有任何评论哟~

Colab/PyTorch - 004 Torchvision Semantic Segmentation

Colab/PyTorch004TorchvisionSemanticSegmentation 1\.源由 2\.语义分割应用 2.1自动驾驶 2.2面部分割 2.3室内物体分割 2.4地理遥感 3\...

Colab - Tensorflow & Keras Get Started: 006 Introduction to Semantic Segmentation

Colab\Tensorflow&KerasGetStarted:006IntroductiontoSemanticSegmentationusingTensorFlowHub 1\.源由 2\.Te...

Colab/PyTorch - 006 Mask RCNN Instance Segmentation

Colab/PyTorch\006MaskRCNNInstanceSegmentation 1\.源由 2\.用PyTorch实现MaskRCNN 2.1输入输出 2.2预训练模型 2.3模型预测 2...

【Semantic Segmentation】Segmentation综述

【SemanticSegmentation】Segmentation综述部分转自：<https://zhuanlan.zhihu.com/p/37618829 一.语义分割基本介绍 1.1概念语义...

【Semantic segmentation】Fully Convolutional Networks for Semantic Segmentation 论文解析

【Semanticsegmentation】FullyConvolutionalNetworksforSemanticSegmentation论文解析目录 0\.论文链接 1\.概述 2\.Adap...

Anti-aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation

AntialiasingSemanticReconstructionforFewShotSemanticSegmentation Abstract 现状经过充分训练数据的特征学习，用少量样本去表示新...

【Semantic Segmentation】DeepLab V3（转）

【SemanticSegmentation】DeepLabV3（转）原文地址：DeepLabv3 代码: TensorFlow Abstract DeepLabv3进一步探讨空洞卷积，这是一个在语义...

Decoders Matter for Semantic Segmentation

DecodersMatterforSemanticSegmentation. 这篇文章收录在CVPR2019。由标题就知道：在语义分割中decoders很重要。

FCN学习:Semantic Segmentation

感谢@huangh12@郑途@麦田守望者对标签图像生成的研究和讨论，这几天研究了一下，补充如下。 \分割线谢谢@潘达的评论，这一篇确实有很多点没有分析完，当时想着后来加上去，但是由于工作的关系，也就...

《DeepLabv3 + for Semantic Image Segmentation》

一、论文《EncoderDecoderwithAtrousSeparableConvolutionforSemanticImageSegmentation》摘要：在深度神经网络中，空间金字塔池模块...

是否确定退出登录?

Colab/PyTorch - 004 Torchvision Semantic Segmentation

Colab/PyTorch - 004 Torchvision Semantic Segmentation

1. 源由

2. 语义分割 - 应用

2.1 自动驾驶

2.2 面部分割

2.3 室内物体分割

2.4 地理遥感

3. 语义分割 - torchvision

3.1 FCN 使用 ResNet-101 语义分割

3.1.1 加载模型

3.1.2 加载图像

3.1.3 预处理图像

3.1.4 网络的前向传播

3.1.5 解码输出

3.1.6 最终结果

3.2 使用 DeepLab 语义分割

3.3 多物体语义分割

4. 总结

4.1 推理时间

4.2 模型大小

5. 参考资料

全部评论 (0)

相关文章推荐

Colab/PyTorch - 004 Torchvision Semantic Segmentation

Colab - Tensorflow & Keras Get Started: 006 Introduction to Semantic Segmentation

Colab/PyTorch - 006 Mask RCNN Instance Segmentation

【Semantic Segmentation】Segmentation综述

【Semantic segmentation】Fully Convolutional Networks for Semantic Segmentation 论文解析

Anti-aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation

【Semantic Segmentation】DeepLab V3（转）

Decoders Matter for Semantic Segmentation

FCN学习:Semantic Segmentation

《DeepLabv3 + for Semantic Image Segmentation》