Feature Pyramid Networks for object detection
FPN
- 总述
-
引言部分
-
相关研究综述
-
特征金字塔网络体系
- 自底向上的信息整合路径
- 自顶向下的信息传播途径与侧向连接
- 自底向上的信息整合路径
-
4. 应用
-
- 用于 RPN
- 用于 Fast R-CNN
-
核心代码复现
-
- FPN网络结构
- ResNet Bottleneck
- 完整代码
-
总述
下图中,在图形界面中以蓝色边框标注的区域为关键点区域 ,其宽度显着表明该区域所包含的语义信息更为丰富;这些关键点区域位于层级结构较高的位置时,则表明它们所代表的对象具有更强的抽象性与概括性属性;此外,在这些关键点区域之间建立起了相互关联关系,并通过这一网络化的关联机制实现了对物体在不同尺度上的有效检测与识别功能;

- a) Featurized image pyramid (特征化图像金字塔):
该传统方案通过从不同尺寸的图像中提取相应的特征图构建起金字塔结构,并最终实现多维度的目标识别。然而由于每个尺寸都需要单独提取特定的特征,在实际应用中导致运算开销较大且效率不高。
- b) Single feature map (单一特征图):
为了优化检测速度这一目标,在某些现代检测系统中选择仅在一个单一尺度上生成和预测特征以避免构建图像金字塔这一复杂计算过程。然而这种做法可能会导致对小型目标物体的小尺度检测精度的降低。
- c) Pyramidal feature hierarchy (金字塔特征层级):
这是一种替代方案;通过卷积神经网络(ConvNet)内部计算出层次递进式的金字塔状特征来进行目标检测;该方法借鉴了类似的方式;而无需为每个图像尺度单独计算相应的特征;从而实现更快捷的目标探测。(值得注意的是,在卷积神经网络中各卷积层输出的不同特征图本身就呈现出层次分明的金字塔结构)
- d) Feature Pyramid Network (FPN, 特征金字塔网络):
本文所提出的这种方案。它不仅能够达到类似的速度水平(即与单一特征图及金字塔特征层级相比),而且其准确率显著高于前者(即在同一数据集上的测试结果显示))。FPN借助自上而下的通路(Top-down pathway)以及横向连接(lateral connection),成功整合了不同层次的表征信息(即通过多级联结机制),最终使得在每一级都能够提取出具有语义深度的特征(即这些特性能够更好地捕捉到图像中的关键元素),以实现更加精确的预测结果(即实验结果表明))。
1.引言
通过对其各个层级进行特征提取的主要优势在于其形成了一个多尺度特征表达,在此过程中各层级均展现出较强的语义意义,并且特别强调了高分辨率层级的表现。
然而
本文旨在利用卷积神经网络以自然的方式构建一个层次分明的特征金字塔,并使其在各个尺度上都具备强烈的语义意义。为了达到这一目标的目的,研究者采用了自顶向下的架构设计,在此架构中通过自上而下的连接路径以及横向整合的方式将低分辨率但语义丰富的特征与高分辨率但语义较弱的特征进行了有效的整合(如图1(d)所示)。其结果是一个在各个层级都具备丰富语义意义的特征金字塔,并且可以在单个输入图像规模内迅速构建出来。换句话说,研究者阐述了如何在不牺牲表示能力、速度或内存的情况下创建出一种可直接嵌入网络中的功能替代物——in-network feature pyramids。
该作者所提出的模型不仅能够适用于 Bounding Box Proposals ,同时还可以延伸至 Masked Proposals
此外, 作者所提出的 FPN 结构可以在所有尺度上实现端到端的统一训练, 并且在训练与测试阶段均保持一致性, 这一点对于传统图像金字塔来说是无法实现的。
2.相关工作
图2详细比较了两种不同的深度学习架构,在对象检测任务中的应用分别涵盖了特征提取阶段以及预测阶段。

Top视图展示了基于自顶向下的架构设计。这种架构设计包含跳跃连接机制,在模型中预测结果仅限于最细粒度的空间分辨率层进行计算。这一策略可参考文献[28]中的方法描述:该方法通过自顶向下路径传递特征信息,并借助跳跃连接模块有效保存细节特征点。对于图像分割任务而言此策略表现优异但存在局限性:当应用于目标检测任务特别是需处理不同尺度目标时效果可能有所下降
底部图(Bottom)呈现的是论文所提出的特征金字塔网络(Feature Pyramid Network, FPN)。它具备与顶部图类似的结构,在各个层级独立完成预测任务。这种设计不仅能够有效利用自顶向下的信息抽取机制以获取具有强语义意义的图像片段,在不同层次上融合自上而下和自下而上的多级表示方面展现出显著的优势。通过这种独特的设计架构,在每个层级都能生成丰富的语义信息,并最终实现精确且可靠的物体检测任务。
这张图直观展示了FPN的主要特点及其如何结合不同深度层次的信息,在多尺度对象检测任务中表现出色。其主要体现在避免仅依据单一尺度信息进行推断的同时,并非传统方法那样复杂地计算各个尺度上的特征。
3. Feature Pyramid Networks
在本研究中, 本研究的主要关注点在于对sliding window-based proposals和Fast R-CNN这两种region-based detector的深入探讨。
作者的方法使用任意尺寸的单尺度图像作为输入;该方法同时在多个层级生成按比例尺寸的特征图;该过程通过纯粹卷积的方式来实现。
在本研究中, 作者对ResNets的应用进行了深入探讨, 并得出了相应的实验结果. 该网络架构主要包含三个关键组成部分: 从下往上延伸的部分、从上往下展开的部分以及横向链接. 具体细节见后文.
Bottom-up pathway
该路径通过卷积神经网络 backbone 实现了数据的前馈传递;该过程整合了不同尺度特征图构成多层级特征结构;缩放步长设定为 2;这些层普遍生成等尺寸输出图像。
针对作者所提出的‘特征金字塔’方案,在各个层级分别设定相应的金字塔层级标准。随后会对该层次的数据作为基础进行补充完善以形成完整的结构体系。这种做法很合理的原因在于其内在逻辑假定越深入的技术环节往往蕴含着更强的能力支撑。
具体而言,在ResNets架构中,该研究采用每个阶段最后一个残差块输出的特征激活进行处理。其中,在卷积层conv2、conv3、conv4和conv5中,则将这些最后的残差块的输出表示为C₂、C₃、C₄、C₅,并观察到这些最终残差块对应的特征图相对于原始输入图像的位置偏移量分别为4、8、16和32个像素。值得注意的是,在构建金字塔特征时,默认情况下未包含第一个卷积层conv1。
Top-down pathway and lateral connections
该过程通过从上方层级利用空间分辨率较低但表征更加丰富的特征图进行upsampling来生成高分辨率特征,并在随后的过程中与低层与高层信息交互融合以增强表现力。各个横向连接整合了来自不同层次且具有相同的空间分辨率的特征图以促进信息融合与提升整体效果,并最终实现了各层次间信息的有效整合以优化整体性能
图3展示了自顶向下特征图的构建模块。在处理较高分辨率的特征图时, 作者采用了基于最近邻插值的方法将空间分辨率提升了一倍.

随后将经过上采样的特征图与经过1×1卷积降维后的自顶向下特征图进行元素级加法 ( \text{element-wise addition})。这一过程不断重复直至构建出最高分辨率的细节化特征图。
为了启动迭代过程,在C_{5}上添加了一个1\times 1卷积层以生成最低分辨率的关键信息图谱。随后,在所有合并后的特征图上叠加一个3\times 3卷积操作以生成最终级别的信息表示,并以此降低上采样过程中可能产生的混叠现象的影响效果。所得到的一系列关键信息表示被命名为{P_{2},P_{3},P_{4},P_{5}}};其中每一项都与对应的原始分支{C_{2},C_{3},C_{4},C_{5}}}具有相同的输出空间维度
考虑到各个层次均采用传统特征图像金字塔结构并共享相应的分类器/回归器这一事实后
简单构成了作者设计的核心要素,在多次测试中证实其模型对于多种设计选项表现出良好的鲁棒性。为了探索更高的性能水平, 作者已经尝试了更为复杂的模块(如多层残差块作为连接),并观察到稍微更好的结果. 虽然优化更为先进的连接机制不在本文探讨范围内, 但因采用了文中所述的简洁方案而获得了较为理想的效果.
4. 应用
用于 RPN
用于 Fast R-CNN
核心代码复现
FPN网络结构

ResNet Bottleneck
在ResNet结构中,主路径与跳跃连接的输出特征图需满足一致的空间维度(宽度×高度)以及相同的通道数量要求,从而实现逐元素的加法操作
观察可知,在从第一个卷积层到第三个卷积层的过程中,通道数的变化表现为翻倍增长,并且由此可知 \text{expansion = 4}。
64,64和256表示的是对应卷积层的输出通道数

- 首先通过1x1卷积层的处理作用,在输入通道数基础上被降到64个。
- 接下来所采用的是一个3x3的空间 convolution结构,在64个通道上展开运算。
- 最后一步则采用了另一个1x1卷积模块,在输出端实现了通道数从64提升至256的效果。
在该结构中,在图示部分展示了残差连接的作用机制,在此结构中,在该结构中
class Bottleneck(nn.Module):
# 输出通道倍增数
expansion = 4
def __init__(self, inchannels, outchannels, stride=1, downsample=None):
super(Bottleneck, self).__init__()
self.bottleneck = nn.Sequential(
# 第一个1*1卷积核降低通道维度
nn.Conv2d(inchannels, outchannels, kernel_size=1, bias=False),
nn.BatchNorm2d(outchannels),
nn.ReLU(inplace=True),
# 第二个3*3卷积核特征提取
nn.Conv2d(outchannels, outchannels, kernel_size=3, stride=stride, padding=1, bias=False),
nn.BatchNorm2d(outchannels),
nn.ReLU(inplace=True),
# 第三个1*1卷积核升高通道维度
nn.Conv2d(outchannels, self.expansion * outchannels, kernel_size=1, bias=False),
nn.BatchNorm2d(self.expansion * outchannels)
)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
def forward(self, x):
identity = x
out = self.bottleneck(x)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
python

完整代码
import torch
import torch.nn as nn
import torch.nn.functional as F
import math
from torchsummary import summary
class Bottleneck(nn.Module):
# 输出通道倍增数
expansion = 4
def __init__(self, inchannels, outchannels, stride=1, downsample=None):
super(Bottleneck, self).__init__()
self.bottleneck = nn.Sequential(
# 第一个1*1卷积核降低通道维度
nn.Conv2d(inchannels, outchannels, kernel_size=1, bias=False),
nn.BatchNorm2d(outchannels),
nn.ReLU(inplace=True),
# 第二个3*3卷积核特征提取
nn.Conv2d(outchannels, outchannels, kernel_size=3, stride=stride, padding=1, bias=False),
nn.BatchNorm2d(outchannels),
nn.ReLU(inplace=True),
# 第三个1*1卷积核升高通道维度
nn.Conv2d(outchannels, self.expansion * outchannels, kernel_size=1, bias=False),
nn.BatchNorm2d(self.expansion * outchannels)
)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
def forward(self, x):
identity = x
out = self.bottleneck(x)
# stride!=1 或者输入通道和输出通道数不一致时,downsample不为空
if self.downsample is not None:
identity = self.downsample(x)
# 在残差网络中,主路径和残差路径的输出必须具有相同的尺寸和通道数,才能进行element-wise相加!!
# 如果不对x进行下采样,那identity的空间尺寸或通道数是和out不一致的
out += identity
out = self.relu(out)
return out
class FPN(nn.Module):
#接收的参数layers是一个列表,每个元素数值代表不同阶段bottleneck块的数量
def __init__(self, layers):
super(FPN, self).__init__()
self.inplanes = 64
# 处理C1模块,如果输入空间尺寸是224*224,经过conv1之后尺寸变为112*112
self.conv1 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
# maxpool将特征图大小改变为 56*56
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
# 搭建自底向上的C2,C3,C4,C5模块
#resnet50 中 layers[0] = 3,layers[1] = 4,layers[2] = 6,layers[3] = 3,
# layers[i]代表各个阶段Bottleneck块的数量
# 64,128,256,512 是输出通道数,_make_layers会返回各个阶段包含的一系列Bottleneck块,
#conv2_x的输入通道64,conv3_x的输入通道128,conv4_x的输入通道256,conv5_x的输入通道512
self.layer1 = self._make_layers(64, layers[0]) #构建C2
self.layer2 = self._make_layers(128, layers[1], stride=2) #构建C3
self.layer3 = self._make_layers(256, layers[2], stride=2) #构建C4
self.layer4 = self._make_layers(512, layers[3], stride=2) #构建C5
# 定义toplayer层,用于后面对C5降低输出通道数,得到P5,
#这里输入通道是2048是因为resnet50及以上的模型经过conv5_x之后输出的通道数是2048
self.toplayer = nn.Conv2d(in_channels=2048, out_channels=256, kernel_size=1, stride=1, padding=0)
# 3*3卷积融合,目的是消除上采样过程带来的重叠效应,以生成最终的特征图
self.smooth1 = nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1)
self.smooth2 = nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1)
self.smooth3 = nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1)
# 横向连接
self.latlayer1 = nn.Conv2d(in_channels=1024, out_channels=256, kernel_size=1, stride=1, padding=0)
self.latlayer2 = nn.Conv2d(in_channels=512, out_channels=256, kernel_size=1, stride=1, padding=0)
self.latlayer3 = nn.Conv2d(in_channels=256, out_channels=256, kernel_size=1, stride=1, padding=0)
# blocks参数代表各个阶段Bottleneck块的数量,来自于layers列表中的元素
# stride参数在第一个Bottleneck块中使用,用于控制特征图的空间尺寸是否减半。
# 如果stride等于2,各个阶段第一个Bottleneck块将会使特征图的高度和宽度减半。
def _make_layers(self, inchannels, blocks, stride=1):
downsample = None
# stride!=1时需要对特征图进行下采样,
# 输入通道数和输出通道数不等时,需要使用1*1卷积核进行通道数调整。
#在残差网络中,主路径和残差路径的输出必须具有相同的尺寸和通道数,才能进行element-wise相加!!
if stride != 1 or self.inplanes != Bottleneck.expansion * inchannels:
downsample = nn.Sequential(
nn.Conv2d(self.inplanes, Bottleneck.expansion * inchannels,
kernel_size=1, stride=stride, bias = False),
nn.BatchNorm2d(Bottleneck.expansion * inchannels)
)
# layers是一个Python列表,它用于存储一系列的Bottleneck块。
# 这些Bottleneck块在一起构成了ResNet的一个阶段(stage),其中每个阶段的输出特征图大小可能会减半,而通道数可能会增加
# 这里的layers和上面的layers不一样,
layers = []
layers.append(Bottleneck(self.inplanes, inchannels, stride, downsample))
# 因为经过第一个Bottleneck之后,第二个Bottleneck块的输入通道数是第一个Bottleneck块的输出通道数,
# 所以需要通道倍增
self.inplanes = inchannels * Bottleneck.expansion
for i in range(1, blocks):
# 这里没有指定stride=stride,所以是默认的stride=1
layers.append(Bottleneck(self.inplanes, inchannels))
return nn.Sequential(*layers)
def _upsample_and_add(self, x, y):
_,_,H,W = y.shape
# x是较高层的特征图,上采样到和较低层y一样的空间尺寸后,和y进行元素级加法
return F.upsample(x, size=(H,W), mode='bilinear') + y
def forward(self, x):
# 自下而上
c1 = self.maxpool(self.relu(self.bn1(self.conv1(x))))
#c2是conv2_x阶段经过一些列bottleneck块的输出,输出通道数为256
c2 = self.layer1(c1)
#c3是conv3_x阶段经过一些列bottleneck块的输出,输出通道数为512
c3 = self.layer2(c2)
#c4是conv4_x阶段经过一些列bottleneck块的输出,输出通道数为1024
c4 = self.layer3(c3)
#c5是conv5_x阶段经过一些列bottleneck块的输出,输出通道数为2048
c5 = self.layer4(c4)
# 自顶向下和横向连接
# toplayer使用1*1卷积核调整通道数为256,latlayer1,2,3的通道数都是256
p5 = self.toplayer(c5)
#c4输出的通道数是1024,所以latlayer1的输入通道数是1024,同理latlayer2,3也是
p4 = self._upsample_and_add(p5, self.latlayer1(c4))
p3 = self._upsample_and_add(p4, self.latlayer2(c3))
p2 = self._upsample_and_add(p3, self.latlayer3(c2))
# 卷积融合,平滑处理
p4 = self.smooth1(p4)
p3 = self.smooth2(p3)
p2 = self.smooth3(p2)
return p2, p3, p4, p5
if __name__ == '__main__':
fpn = FPN([3, 4, 6, 3]).cuda()
summary(fpn, (3, 224, 224))
'''
if __name__ == '__main__':
model = FPN([3, 4, 6, 3])
print(model)
input = torch.randn(1, 3, 224, 224)
out = model(input)
# 遍历输出tuple中的每个元素,并打印其尺寸信息
for i, feature_map in enumerate(out):
print(f"Feature map {i} size: {feature_map.size()}")
'''
python

执行之后的打印信息:
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 112, 112] 9,408
BatchNorm2d-2 [-1, 64, 112, 112] 128
ReLU-3 [-1, 64, 112, 112] 0
MaxPool2d-4 [-1, 64, 56, 56] 0
Conv2d-5 [-1, 64, 56, 56] 4,096
BatchNorm2d-6 [-1, 64, 56, 56] 128
ReLU-7 [-1, 64, 56, 56] 0
Conv2d-8 [-1, 64, 56, 56] 36,864
BatchNorm2d-9 [-1, 64, 56, 56] 128
ReLU-10 [-1, 64, 56, 56] 0
Conv2d-11 [-1, 256, 56, 56] 16,384
BatchNorm2d-12 [-1, 256, 56, 56] 512
Conv2d-13 [-1, 256, 56, 56] 16,384
BatchNorm2d-14 [-1, 256, 56, 56] 512
ReLU-15 [-1, 256, 56, 56] 0
Bottleneck-16 [-1, 256, 56, 56] 0
Conv2d-17 [-1, 64, 56, 56] 16,384
BatchNorm2d-18 [-1, 64, 56, 56] 128
ReLU-19 [-1, 64, 56, 56] 0
Conv2d-20 [-1, 64, 56, 56] 36,864
BatchNorm2d-21 [-1, 64, 56, 56] 128
ReLU-22 [-1, 64, 56, 56] 0
Conv2d-23 [-1, 256, 56, 56] 16,384
BatchNorm2d-24 [-1, 256, 56, 56] 512
ReLU-25 [-1, 256, 56, 56] 0
Bottleneck-26 [-1, 256, 56, 56] 0
Conv2d-27 [-1, 64, 56, 56] 16,384
BatchNorm2d-28 [-1, 64, 56, 56] 128
ReLU-29 [-1, 64, 56, 56] 0
Conv2d-30 [-1, 64, 56, 56] 36,864
BatchNorm2d-31 [-1, 64, 56, 56] 128
ReLU-32 [-1, 64, 56, 56] 0
Conv2d-33 [-1, 256, 56, 56] 16,384
BatchNorm2d-34 [-1, 256, 56, 56] 512
ReLU-35 [-1, 256, 56, 56] 0
Bottleneck-36 [-1, 256, 56, 56] 0
Conv2d-37 [-1, 128, 56, 56] 32,768
BatchNorm2d-38 [-1, 128, 56, 56] 256
ReLU-39 [-1, 128, 56, 56] 0
Conv2d-40 [-1, 128, 28, 28] 147,456
BatchNorm2d-41 [-1, 128, 28, 28] 256
ReLU-42 [-1, 128, 28, 28] 0
Conv2d-43 [-1, 512, 28, 28] 65,536
BatchNorm2d-44 [-1, 512, 28, 28] 1,024
Conv2d-45 [-1, 512, 28, 28] 131,072
BatchNorm2d-46 [-1, 512, 28, 28] 1,024
ReLU-47 [-1, 512, 28, 28] 0
Bottleneck-48 [-1, 512, 28, 28] 0
Conv2d-49 [-1, 128, 28, 28] 65,536
BatchNorm2d-50 [-1, 128, 28, 28] 256
ReLU-51 [-1, 128, 28, 28] 0
Conv2d-52 [-1, 128, 28, 28] 147,456
BatchNorm2d-53 [-1, 128, 28, 28] 256
ReLU-54 [-1, 128, 28, 28] 0
Conv2d-55 [-1, 512, 28, 28] 65,536
BatchNorm2d-56 [-1, 512, 28, 28] 1,024
ReLU-57 [-1, 512, 28, 28] 0
Bottleneck-58 [-1, 512, 28, 28] 0
Conv2d-59 [-1, 128, 28, 28] 65,536
BatchNorm2d-60 [-1, 128, 28, 28] 256
ReLU-61 [-1, 128, 28, 28] 0
Conv2d-62 [-1, 128, 28, 28] 147,456
BatchNorm2d-63 [-1, 128, 28, 28] 256
ReLU-64 [-1, 128, 28, 28] 0
Conv2d-65 [-1, 512, 28, 28] 65,536
BatchNorm2d-66 [-1, 512, 28, 28] 1,024
ReLU-67 [-1, 512, 28, 28] 0
Bottleneck-68 [-1, 512, 28, 28] 0
Conv2d-69 [-1, 128, 28, 28] 65,536
BatchNorm2d-70 [-1, 128, 28, 28] 256
ReLU-71 [-1, 128, 28, 28] 0
Conv2d-72 [-1, 128, 28, 28] 147,456
BatchNorm2d-73 [-1, 128, 28, 28] 256
ReLU-74 [-1, 128, 28, 28] 0
Conv2d-75 [-1, 512, 28, 28] 65,536
BatchNorm2d-76 [-1, 512, 28, 28] 1,024
ReLU-77 [-1, 512, 28, 28] 0
Bottleneck-78 [-1, 512, 28, 28] 0
Conv2d-79 [-1, 256, 28, 28] 131,072
BatchNorm2d-80 [-1, 256, 28, 28] 512
ReLU-81 [-1, 256, 28, 28] 0
Conv2d-82 [-1, 256, 14, 14] 589,824
BatchNorm2d-83 [-1, 256, 14, 14] 512
ReLU-84 [-1, 256, 14, 14] 0
Conv2d-85 [-1, 1024, 14, 14] 262,144
BatchNorm2d-86 [-1, 1024, 14, 14] 2,048
Conv2d-87 [-1, 1024, 14, 14] 524,288
BatchNorm2d-88 [-1, 1024, 14, 14] 2,048
ReLU-89 [-1, 1024, 14, 14] 0
Bottleneck-90 [-1, 1024, 14, 14] 0
Conv2d-91 [-1, 256, 14, 14] 262,144
BatchNorm2d-92 [-1, 256, 14, 14] 512
ReLU-93 [-1, 256, 14, 14] 0
Conv2d-94 [-1, 256, 14, 14] 589,824
BatchNorm2d-95 [-1, 256, 14, 14] 512
ReLU-96 [-1, 256, 14, 14] 0
Conv2d-97 [-1, 1024, 14, 14] 262,144
BatchNorm2d-98 [-1, 1024, 14, 14] 2,048
ReLU-99 [-1, 1024, 14, 14] 0
Bottleneck-100 [-1, 1024, 14, 14] 0
Conv2d-101 [-1, 256, 14, 14] 262,144
BatchNorm2d-102 [-1, 256, 14, 14] 512
ReLU-103 [-1, 256, 14, 14] 0
Conv2d-104 [-1, 256, 14, 14] 589,824
BatchNorm2d-105 [-1, 256, 14, 14] 512
ReLU-106 [-1, 256, 14, 14] 0
Conv2d-107 [-1, 1024, 14, 14] 262,144
BatchNorm2d-108 [-1, 1024, 14, 14] 2,048
ReLU-109 [-1, 1024, 14, 14] 0
Bottleneck-110 [-1, 1024, 14, 14] 0
Conv2d-111 [-1, 256, 14, 14] 262,144
BatchNorm2d-112 [-1, 256, 14, 14] 512
ReLU-113 [-1, 256, 14, 14] 0
Conv2d-114 [-1, 256, 14, 14] 589,824
BatchNorm2d-115 [-1, 256, 14, 14] 512
ReLU-116 [-1, 256, 14, 14] 0
Conv2d-117 [-1, 1024, 14, 14] 262,144
BatchNorm2d-118 [-1, 1024, 14, 14] 2,048
ReLU-119 [-1, 1024, 14, 14] 0
Bottleneck-120 [-1, 1024, 14, 14] 0
Conv2d-121 [-1, 256, 14, 14] 262,144
BatchNorm2d-122 [-1, 256, 14, 14] 512
ReLU-123 [-1, 256, 14, 14] 0
Conv2d-124 [-1, 256, 14, 14] 589,824
BatchNorm2d-125 [-1, 256, 14, 14] 512
ReLU-126 [-1, 256, 14, 14] 0
Conv2d-127 [-1, 1024, 14, 14] 262,144
BatchNorm2d-128 [-1, 1024, 14, 14] 2,048
ReLU-129 [-1, 1024, 14, 14] 0
Bottleneck-130 [-1, 1024, 14, 14] 0
Conv2d-131 [-1, 256, 14, 14] 262,144
BatchNorm2d-132 [-1, 256, 14, 14] 512
ReLU-133 [-1, 256, 14, 14] 0
Conv2d-134 [-1, 256, 14, 14] 589,824
BatchNorm2d-135 [-1, 256, 14, 14] 512
ReLU-136 [-1, 256, 14, 14] 0
Conv2d-137 [-1, 1024, 14, 14] 262,144
BatchNorm2d-138 [-1, 1024, 14, 14] 2,048
ReLU-139 [-1, 1024, 14, 14] 0
Bottleneck-140 [-1, 1024, 14, 14] 0
Conv2d-141 [-1, 512, 14, 14] 524,288
BatchNorm2d-142 [-1, 512, 14, 14] 1,024
ReLU-143 [-1, 512, 14, 14] 0
Conv2d-144 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-145 [-1, 512, 7, 7] 1,024
ReLU-146 [-1, 512, 7, 7] 0
Conv2d-147 [-1, 2048, 7, 7] 1,048,576
BatchNorm2d-148 [-1, 2048, 7, 7] 4,096
Conv2d-149 [-1, 2048, 7, 7] 2,097,152
BatchNorm2d-150 [-1, 2048, 7, 7] 4,096
ReLU-151 [-1, 2048, 7, 7] 0
Bottleneck-152 [-1, 2048, 7, 7] 0
Conv2d-153 [-1, 512, 7, 7] 1,048,576
BatchNorm2d-154 [-1, 512, 7, 7] 1,024
ReLU-155 [-1, 512, 7, 7] 0
Conv2d-156 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-157 [-1, 512, 7, 7] 1,024
ReLU-158 [-1, 512, 7, 7] 0
Conv2d-159 [-1, 2048, 7, 7] 1,048,576
BatchNorm2d-160 [-1, 2048, 7, 7] 4,096
ReLU-161 [-1, 2048, 7, 7] 0
Bottleneck-162 [-1, 2048, 7, 7] 0
Conv2d-163 [-1, 512, 7, 7] 1,048,576
BatchNorm2d-164 [-1, 512, 7, 7] 1,024
ReLU-165 [-1, 512, 7, 7] 0
Conv2d-166 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-167 [-1, 512, 7, 7] 1,024
ReLU-168 [-1, 512, 7, 7] 0
Conv2d-169 [-1, 2048, 7, 7] 1,048,576
BatchNorm2d-170 [-1, 2048, 7, 7] 4,096
ReLU-171 [-1, 2048, 7, 7] 0
Bottleneck-172 [-1, 2048, 7, 7] 0
Conv2d-173 [-1, 256, 7, 7] 524,544
Conv2d-174 [-1, 256, 14, 14] 262,400
Conv2d-175 [-1, 256, 28, 28] 131,328
Conv2d-176 [-1, 256, 56, 56] 65,792
Conv2d-177 [-1, 256, 14, 14] 590,080
Conv2d-178 [-1, 256, 28, 28] 590,080
Conv2d-179 [-1, 256, 56, 56] 590,080
================================================================
Total params: 26,262,336
Trainable params: 26,262,336
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 302.71
Params size (MB): 100.18
Estimated Total Size (MB): 403.47
----------------------------------------------------------------

查看FPN模型信息
if __name__ == '__main__':
model = FPN([3, 4, 6, 3])
print(model)
input = torch.randn(1, 3, 224, 224)
out = model(input)
# 遍历输出tuple中的每个元素,并打印其尺寸信息
for i, feature_map in enumerate(out):
print(f"Feature map {i} size: {feature_map.size()}")
python
打印信息如下:
FPN(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
)
)
(layer2): Sequential(
(0): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
)
)
(layer3): Sequential(
(0): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
)
(4): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
)
(5): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
)
)
(layer4): Sequential(
(0): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
)
)
(toplayer): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
(smooth1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(smooth2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(smooth3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(latlayer1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
(latlayer2): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(latlayer3): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
)
D:\Miniconda\envs\SINet_01\lib\site-packages\torch\nn\functional.py:3769: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
Feature map 0 size: torch.Size([1, 256, 56, 56])
Feature map 1 size: torch.Size([1, 256, 28, 28])
Feature map 2 size: torch.Size([1, 256, 14, 14])
Feature map 3 size: torch.Size([1, 256, 7, 7])

