Faster Rcnn算法复现
Faster Rcnn算法复现
- Faster Rnn 实现流程
- 代码
- TODO
复现算法Faster Rcnn
Faster Rcnn算法原文链接: https://arxiv.org/abs/1506.01497
Faster Rnn 实现流程
Faster Rcnn是双阶段目标检测家族中的一员,由Rcnn -> Spp-net -> Fast Rcnn 再到Faster Rcnn,Faster Rcnn中首次使用深度学习的方法进行关键区域的提取,真正实现了end to end的目标检测,Faster Rcnn是双阶段目标检测系列最关键的节点,其后出现的Mask Rcnn与Cascade Rcnn都是基于Faster Rcnn而来,本次实现一个简要版的Faster Rcnn以增强自己对其的理解。
在之前参加天池比赛时,使用了Faster Rcnn和FPN,并做出了一定的改进也取得了不错的成绩,但当时是在mmdetection框架的基础上进行改进,难免无法顾及一些细节,通过这次从头开始实现Faster Rcnn和FPN,对细节方面有了更好的掌握,相信在实现了Faster Rcnn后,双步和和单步的目标检测算法我都可以进行简要版的复现,下图是Faster Rcnn的结构图。

Faster Rcnn的实现分为五个阶段:
第一阶段通过输入图像及其标注框信息(后续称为ground-truth)推导锚点的真实标签及位移坐标,并将这些锚点的真实标签及位移坐标用于计算RPN网络预测值与真实值之间的损失函数以更新RPN网络参数
dy=(gt_y-anchor_y)/anchor_h\tag{2}
dw=log(gt_w/anchor_w)\tag{3}
dh=log(gt_h/anchor_h)\tag{4}
其中,
dx
,
dy
,
dw
,
dh
表示 anchor 相对于 ground_truth 的位置偏移量,
gt_x
,
gt_y
代表 ground_truth 的中心位置坐标,
其宽高由 gt_w 和 gt_h 给出;
对应的 anchor_x 和 anchor_y 是其中心坐标的 x 轴和 y 轴分量,
同时根据 anchor 与 ground_truth 之间的iou值来确定真实标签(0或1);
该阶段的任务是对每一个 anchor 标识其实现位置偏移量及其对应的真实标签,
即生成 gt_anchor_locations 和 gt_anchor_labels,
这些数据将用于联合 RPN 网络预测 pred_anchor_locations 和 pred_anchor_labels 来计算损失函数。
在第二个阶段中通过RPN网络对所提取的所有锚点(anchor)进行位移坐标及标签预测并输出位置偏移坐标预测结果以及类别标签预测结果具体而言下图展示了该过程如图3所示这是该方法的核心组件在实际操作过程中第一部分生成器会输出一个大小为50\times 50的空间分辨率且通道数达512的数据张量经过一个3\times 3的空间卷积核后采用两个并行的一对一卷积支路其中主支路施加填充量设为1从而保持输入数据的空间尺寸不变而辅助支路则分别负责对每个像素点处9个锚点的位置偏移信息及其类别信息进行精确估计最终系统会在第二部分生成器中得到完整的定位结果这对提升整体检测精度具有重要意义

第三阶段中,针对第二阶段预测结果中的anchor,通过结合pred_anchor_locations中的dx, dy, dw, dh参数与原始anchor信息进行逆向推算,确定RPN阶段预测结果对应的ground_truth框的具体坐标(x1,y1,x2,y2).基于score值对方框进行排序筛选,选取前12000个候选框进行NMS处理.在经过NMS后的候选框中选取前2000个作为最终候选区域.这些候选区域将基于ground_truth框进行采样定位,并计算其相对于ground_truth的真实标签与其位移坐标locations.在此过程中,依据iou值对候选区域进行筛选分类:iou值大于等于0.5的区域标记为正样本.记录对应的ground_truth标签以确定真实类别数量而非仅分为前景背景(即类别数为非二元分类).定位公式与第一阶段保持一致.随后对筛选出的正样本区域执行进一步采样操作,最终输出128个样本集.其中正样本占总样本比例为25%,其余为负样本.整个过程输出包括RPN网络预测结果对应的pred_anchor_locations、pred_anchor_labels以及与ground_truth匹配后的128组gt_roi_labels和gt_roi_locations数据集
第四阶段开始时,在第二阶段使用RPN网络生成了pred_anchor_labels和pred_anchor_locations。第三阶段则从这些结果中采样得到了128个sample_rois,并计算出它们相对于ground_truth的真实标签及位移坐标(即gt_roi_labels和gt_roi_locations)。第四阶段首先将第三阶段得到的sample_rois输入到roi pooling层中以获取固定大小为7×7×512的特征图。随后将其展平生成一个(1, 25088)维度的特征向量,并将其传递至两层全连接层以获取(1,4096)维度的最终特征向量。接着进入两个分支结构分别预测目标类别(共num_class+1个)及其位移坐标(每个类别对应4个坐标),得到最终结果pred_roi_labels和pred_roi_locations。
第五阶段,根据前四个阶段计算的结果计算损失,其中RPN阶段的损失通过gt_anchor_labels、gt_anchor_locations、pred_anchor_labels、pred_anchor_locations计算,ROI阶段的损失通过gt_roi_labels、gt_roi_locations、pred_roi_labels、pred_roi_locations计算,分类损失使用交叉熵损失函数计算,回归损失通过smooth L1损失函数计算,分别计算出rpn_cls_loss、rpn_loc_loss、roi_cls_loss、roi_loc_loss,计算损失时要注意,分类损失是对所有框进行计算,而回归损失只对样本标签有意义的框计算,因此在计算总损失时要在回归损失前乘以10或者使分类损失除以10,即
rpn_loss = rpn_cls_loss/10 + rpn_loc_loss,
roi_loss = roi_cls_loss/10+roi_loc_loss,
total_loss = rpn_loss+roi_loss。
最后根据损失更新权重。 交叉熵损失函数如式5所示,smooth L1损失如式6所示。
L=-\sum_{c=1}^{M} y_{c} \log \left(p_{c}\right)\tag{5}
L=\left\{\begin{array}{cc}{0.5 \mathrm{x}^{2},} & {|x|\right.\tag{6}
代码
import numpy as np
2.
3. def iou(valid_anchors, gt_box):
4. # 传入两个box,左上坐标和右下坐标,大小为 n*4
5. # 返回ious,((len(valid_anchors)*len(gt_box)))
6. # 每个valid_anchor与每个gt_box都有iou,ious维度:(len(valid_anchors)*len(gt_box))
7. valid_anchors_num = valid_anchors.shape[0]
8. gt_box_num = gt_box.shape[0]
9. ious = np.empty((valid_anchors_num, gt_box_num))
10. ious.fill(0)
11. for i, anchor in enumerate(valid_anchors):
12. xa1, ya1, xa2, ya2 = anchor
13. area1 = (xa2-xa1)*(ya2-ya1)
14. for j, bbox in enumerate(gt_box):
15. xb1, yb1, xb2, yb2 = bbox
16. area2 = (xb2-xb1)*(yb2-yb1)
17. xx1 = np.max([xa1, xb1])
18. yy1 = np.max([ya1, yb1])
19. xx2 = np.min([xa2, xb2])
20. yy2 = np.min([ya2, yb2])
21. if(xx1 < xx2 and yy1 < yy2):
22. inter_area = (yy2-yy1)*(xx2-xx1)
23. iou = inter_area/(area1+area2-inter_area)
24. ious[i, j] = iou
25. return ious
26.
27. def nms(bboxes, thre, scores):
28. # 输入为n*4的框, thre为阙值, scores为每个框对应的score
29. # 输入均为numpy类型
30. # 输出为nms后的剩余框
31. x1 = bboxes[:, 0]
32. y1 = bboxes[:, 1]
33. x2 = bboxes[:, 2]
34. y2 = bboxes[:, 3]
35. areas = (x2-x1)*(y2-y1)
36. order = np.argsort(scores)[::-1]
37. keep = [] # nms后剩下的框的index
38. while order.size > 0:
39. i = order[0] # i为最大score的索引
40. keep.append(i)
41. xx1 = np.maximum(x1[i], x1[order[1:]])
42. yy1 = np.maximum(y1[i], y1[order[1:]])
43. xx2 = np.minimum(x2[i], x2[order[1:]])
44. yy2 = np.minimum(y2[i], y2[order[1:]])
45. w = np.maximum(0, xx2-xx1)
46. h = np.maximum(0, yy2-yy1)
47. inter = w*h
48. ious = inter/(areas[i]+areas[order[1:]]-inter)
49. indexes = np.where(ious < thre)[0]
50. order = order[indexes+1]
51. return keep
主要模块faster_rcnn.py:
import torch
2. import torchvision
3. import torch.nn as nn
4. import torch.nn.functional as F
5. import numpy as np
6. import util
7.
8. '''''
9. 第一阶段,根据原图的gt生成anchor的gt,生成的anchor_gt用于与RPN网络产生的roi计算损失
10. 注意,该阶段anchor只分两类,0或1,-1表示忽略
11. RPN网络对特征图上的每个点上的9个anchor进行预测,预测其类别(0,1)和其相对于gt的相对位置(dx,dy,dw,dh)
12. 此部分我们要先求出每个anchor分配后的实际类别(0,1)和相对于gt的真实位置(dx,dy,dw,dh)
13. 以此来求loss
14. 对于800*800的图,下采样16倍后特征图大小为50*50,每个位置9个anchor,共50*50*9即22500个anchor
15. 对这22500个anchor,先求出其真实的类别和相对gt的位移,再与RPN网络预测的类别和位移相比较,计算损失。
16. 该部分共采样了256个anchor,也就是真实求出的anchor labels中只有256个是1或0,别的都是-1(忽略)
17. '''
18. # 先制作一张图片,并设置其groud_truth和对应的label
19. image = torch.zeros((1, 3, 800, 800))
20. bboxes = torch.Tensor([[20, 30, 400, 500], [300, 400, 500, 600]])
21. labels = torch.Tensor([6, 8])
22. sub_sample = 16 # 下采样倍数
23.
24. # 获取vgg模型,使用vgg模型提取特征,下采样16倍
25. model = torchvision.models.vgg16(pretrained=True)
26. fe = list(model.features)
27.
28. backbone = []
29. img_bak = image.clone()
30. for i in fe:
31. img_bak = i(img_bak)
32. if(img_bak.shape[2] < 50):
33. break
34. backbone.append(i)
35. out_channels = img_bak.shape[1]
36. backbone = nn.Sequential(*backbone)
37. feature_map = backbone(image)
38. print(backbone)
39. print(feature_map.shape) # 50*50
40.
41. # 对特征图生成所有anchors,特征图为50*50,将其上每个点映射回原图生成anchors
42. size = 800//16
43. centerX = np.arange(16, (size+1)*16, 16)
44. centerY = np.arange(16, (size+1)*16, 16)
45. # print(centerX)
46. center_x = centerX - 8
47. center_y = centerY - 8
48. print(center_x)
49. # anchor的参数,注意scale是针对特征图的
50. anchor_scales = [8, 16, 32]
51. anchor_ratios = [0.5, 1.0, 2]
52. anchor_center = np.zeros((size*size, 2)) # 2500*2
53. # 初始化anchor的中心, 共2500个
54. index = 0
55. for i in range(len(center_x)):
56. for j in range(len(centerY)):
57. anchor_center[index, 0] = center_x[i]
58. anchor_center[index, 1] = center_y[j]
59. index += 1
60. print(anchor_center.shape)
61.
62. # 生成所有的anchors
63. anchors = torch.zeros((size*size*9, 4), dtype=torch.float32) # 共50*50个位置,每个位置9个anchors,每个anchor4个坐标(x1,y1,x2,y2)
64. index = 0
65. for c in anchor_center:
66. center_x, center_y = c
67. for i in range(len(anchor_scales)):
68. for j in range(len(anchor_ratios)):
69. # h = np.sqrt(sub_sample*anchor_scales[i]*anchor_ratios[j])
70. # w = np.sqrt(sub_sample*anchor_scales[i]*(1./anchor_ratios[j]))
71. h = sub_sample * anchor_scales[i] * np.sqrt(anchor_ratios[j])
72. w = sub_sample * anchor_scales[i] * np.sqrt((1. / anchor_ratios[j]))
73. anchors[index, 0] = center_x - w/2
74. anchors[index, 1] = center_y - h/2
75. anchors[index, 2] = center_x + w/2
76. anchors[index, 3] = center_y + h/2
77. index += 1
78. print(anchors.shape)
79. print(anchors)
80.
81. # 获取有效的anchors的索引index, 即不超过边界的anchors
82. valid_anchors_index = np.where( # 有效anchors的索引
83. (anchors[:, 0] >= 0) &
84. (anchors[:, 1] >= 0) &
85. (anchors[:, 2] <= 800) &
86. (anchors[:, 3] <= 800)
87. )[0]
88. print(valid_anchors_index)
89. valid_anchors = anchors[valid_anchors_index] # 有效anchors
90. print(valid_anchors_index.shape)
91. print(valid_anchors.shape)
92. # 计算所有有效anchor和gt的iou
93. ious = util.iou(valid_anchors, bboxes) # (valid_anchors.shape[0], bboxes.shape[0])
94. print(ious.shape)
95. '''''
96. 开始分类anchor,与gt的iou最大的ancho分为前景,max iou>0.7的分为前景,否则分为背景
97. '''
98. gt_maxiou_index = ious.argmax(axis=0) # axis=0表示对列取最大,ious有两列,每一列的最大值的index
99. print(gt_maxiou_index)
100. anchor_maxiou_index = ious.argmax(axis=1) # 对ious每行取最大值,即anchor与几个gt的iou中的最大值
101. print(anchor_maxiou_index)
102. # 取出每个gt最大iou的anchor和每个anchor最大iou的gt
103. gt_maxiou = ious[gt_maxiou_index, np.arange(bboxes.shape[0])]
104. anchor_maxiou = ious[np.arange(valid_anchors.shape[0]), anchor_maxiou_index]
105. print(gt_maxiou.shape)
106. print(anchor_maxiou.shape)
107. gt_maxiou_index = np.where(ious==gt_maxiou)[0] # 和gt有最大iou的anchor的索引
108.
109. # 设置pos参数,即iou大于0.7的为前景,小于0.3为背景,采样256个,前景占比0.5
110. pos_iou_thre = 0.7
111. neg_iou_thre = 0.3
112. pos_ratio = 0.5
113. n_sample = 256
114. valid_anchor_labels = np.empty((valid_anchors.shape[0]))
115. valid_anchor_labels.fill(-1) # 初始化为-1, 表示忽略
116. valid_anchor_labels[gt_maxiou_index] = 1
117. valid_anchor_labels[anchor_maxiou >= pos_iou_thre] = 1
118. valid_anchor_labels[anchor_maxiou < neg_iou_thre] = 0
119. print(valid_anchor_labels.shape)
120. # 采样正负样本
121. n_pos = n_sample*pos_ratio
122. pos_index = np.where(valid_anchor_labels == 1)[0]
123. if(len(pos_index) > n_pos):
124. disable_index = np.random.choice(pos_index, size=(len(pos_index)-n_pos), replace=False)
125. valid_anchor_labels[disable_index] = -1
126.
127. n_neg = n_sample*(1-pos_ratio)
128. if(len(pos_index) > n_pos):
129. pass
130. else:
131. n_neg = n_sample-len(pos_index)
132. neg_index = np.where(valid_anchor_labels==0)[0]
133. if(len(neg_index) > n_neg):
134. disable_index = np.random.choice(neg_index, size=(len(neg_index) - n_neg), replace=False)
135. valid_anchor_labels[disable_index] = -1
136. # 此时正负样本均已采样,共采样256个
137. print(np.sum(valid_anchor_labels==1))
138. print(np.sum(valid_anchor_labels==0))
139.
140. # 开始给每个anchor分配位置,dx,dy,dw,dh,将每个anchor分配到与其具有最大iou的gt上,即anchor相对于gt的坐标
141. '''''
142. t_{x} = (x - x_{a})/w_{a}
143. t_{y} = (y - y_{a})/h_{a}
144. t_{w} = log(w/ w_a)
145. t_{h} = log(h/ h_a)
146. x, y , w, h是ground truth box的中心坐标,宽,高。x_a,y_a,h_a,w_a为anchor boxes的中心坐标,宽,高。
147. '''
148. anchor_maxiou_gtbox = bboxes[anchor_maxiou_index]
149. print(anchor_maxiou_gtbox.shape)
150. w = anchor_maxiou_gtbox[:, 2] - anchor_maxiou_gtbox[:, 0]
151. h = anchor_maxiou_gtbox[:, 3] - anchor_maxiou_gtbox[:, 1]
152. x = anchor_maxiou_gtbox[:, 0] + w/2
153. y = anchor_maxiou_gtbox[:, 1] + h/2
154. anchor_w = valid_anchors[:, 2] - valid_anchors[:, 0]
155. anchor_h = valid_anchors[:, 3] - valid_anchors[:, 1]
156. anchor_x = valid_anchors[:, 0] + anchor_w/2
157. anchor_y = valid_anchors[:, 1] + anchor_h/2
158. eps = torch.tensor(1e-10)
159. anchor_h = np.maximum(anchor_h, eps)
160. anchor_w = np.maximum(anchor_w, eps)
161. dx = (x-anchor_x)/anchor_w
162. dy = (y-anchor_y)/anchor_h
163. dw = np.log(w/anchor_w)
164. dh = np.log(h/anchor_h)
165. anchor_location = np.vstack((dx, dy, dw, dh)).transpose()
166. print(anchor_location.shape)
167. anchor_labels = np.zeros((anchors.shape[0]), dtype=np.int32)
168. anchor_labels.fill(-1)
169. anchor_locations = np.zeros_like(anchors, dtype=np.float32)
170. anchor_locations.fill(-1)
171. anchor_labels[valid_anchors_index] = valid_anchor_labels
172. anchor_locations[valid_anchors_index] = anchor_location
173. print(anchor_labels.shape)
174. print(anchor_locations.shape)
175. # 以上为第一部分,获取真实的anchor类别和相对gt的位移坐标。
176.
177. '''''
178. 第二部分,用RPN网络生成预测的anchor的类别和位移坐标
179. '''
180. class RPN(nn.Module):
181. def __init__(self):
182. super(RPN, self).__init__()
183. mid_channels = 512
184. in_channels = 512
185. self.conv1 = nn.Conv2d(in_channels, mid_channels, 3, 1, 1)
186. self.reg_layer = nn.Conv2d(mid_channels, len(anchor_scales)*len(anchor_ratios)*4, 1, 1, 0)
187. self.cls_layer = nn.Conv2d(mid_channels, len(anchor_scales)*len(anchor_ratios)*2, 1, 1, 0)
188. self.conv1.weight.data.normal_(0, 0.01)
189. self.conv1.bias.data.zero_()
190. self.reg_layer.weight.data.normal_(0, 0.01)
191. self.reg_layer.bias.data.zero_()
192. self.cls_layer.weight.data.normal_(0, 0.01)
193. self.cls_layer.bias.data.zero_()
194.
195. def forward(self, x):
196. x = self.conv1(x)
197. pred_anchor_location = self.reg_layer(x)
198. pred_anchor_cls = self.cls_layer(x)
199. return pred_anchor_location, pred_anchor_cls
200.
201. rpn = RPN()
202. print(feature_map.shape)
203. pred_anchor_location, pred_anchor_cls = rpn(feature_map)
204. print(pred_anchor_location.shape)
205. print(pred_anchor_cls.shape)
206. pred_anchor_location = pred_anchor_location.permute(0, 2, 3, 1).contiguous().view(1, -1, 4)
207. pred_anchor_cls = pred_anchor_cls.permute(0, 2, 3, 1).contiguous().view(1, -1, 2)
208. print(pred_anchor_location.shape)
209. print(pred_anchor_cls.shape)
210. print(anchor_locations.shape)
211. print(anchor_labels.shape)
212. # pred_anchor_location与anchor_locations对应,pred_anchor_cls与anchor_labels对应
213. # 用于计算RPN_loss
214. # objectness_score中存储的是每个anchor属于正类的预测分数
215. objectness_score = pred_anchor_cls.view(1, 50, 50, 9, 2)[:, :, :, :, 1].contiguous().view(1, -1) # 预测每个anchor是正样本的分数
216. # 第二部分结束,用RPN网络预测所有anchor的类别和位移坐标,与第一部分求出的所有anchor的真实类别和位移坐标计算rpn loss
217. '''''
218. 第三部分,通过rpn预测的anchor的类别和位移坐标生成roi,输入roi head进行预测
219. 该部分对rpn预测的22500个anchor,先根据预测的位移坐标还原到anchor的坐标,再对前n1个进行nms
220. 再在nms后的anchor中选取前n2个传入roi head进行预测。
221. rpn生成的是原始anchors相对与gt的偏移量。
222. 再第一部分先根据实际gt计算出了原始anchor相对于gt的真实偏移量(256个有效的)
223. 该部分的目的是生成送入roi head的框
224. '''
225. nms_thre = 0.7
226. n_train_pre_nms = 12000
227. n_train_post_nms = 2000
228. n_test_pre_nms = 6000
229. n_test_post_nms = 300
230. min_size = 16
231. # 先把rpn网络预测的位移坐标转换成(x1,y1,x2,y2)坐标
232. '''''
233. x = (w_{a} * ctr_x_{p}) + ctr_x_{a}
234. y = (h_{a} * ctr_x_{p}) + ctr_x_{a}
235. h = np.exp(h_{p}) * h_{a}
236. w = np.exp(w_{p}) * w_{a}
237. 根据原始anchors坐标和rpn生成的dx, dy, dw, dh逆向推断出预测的gt的位置
238. '''
239. pred_anchor_location_numpy = pred_anchor_location[0].data.numpy()
240. objectness_score_numpy = objectness_score[0].data.numpy()
241. anchor_w = anchors[:, 2] - anchors[:, 0]
242. anchor_h = anchors[:, 3] - anchors[:, 1]
243. anchor_x = anchors[:, 0] + anchor_w/2
244. anchor_y = anchors[:, 1] + anchor_h/2
245. dx = pred_anchor_location_numpy[:, 0]
246. dy = pred_anchor_location_numpy[:, 1]
247. dw = pred_anchor_location_numpy[:, 2]
248. dh = pred_anchor_location_numpy[:, 3]
249. # dx1 = pred_anchor_location_numpy[:, 0::4]
250. # dy1 = pred_anchor_location_numpy[:, 1::4]
251. # dw1 = pred_anchor_location_numpy[:, 2::4]
252. # dh1 = pred_anchor_location_numpy[:, 3::4]
253. dx = torch.from_numpy(dx)
254. dy = torch.from_numpy(dy)
255. dw = torch.from_numpy(dw)
256. dh = torch.from_numpy(dh)
257. # 获得基于预测结果(位移坐标)得到的预测框在原图的center_x, center_y, w, h
258. pred_gt_center_x = dx*anchor_w+anchor_x
259. pred_gt_center_y = dy*anchor_h+anchor_y
260. pred_gt_w = np.exp(dw)*anchor_w
261. pred_gt_h = np.exp(dh)*anchor_h
262. print(pred_gt_center_x.shape)
263. print(pred_gt_center_y.shape)
264. print(pred_gt_w.shape)
265. print(pred_gt_h.shape)
266. # 再根据得到的center_x, center_y, w, h转换成左上坐标和右下坐标(x1,y1), (x2,y2)
267. rois = torch.zeros_like(pred_anchor_location[0]) # (22500, 4)
268. rois[:, 0] = pred_gt_center_x - pred_gt_w/2
269. rois[:, 1] = pred_gt_center_y - pred_gt_h/2
270. rois[:, 2] = pred_gt_center_x + pred_gt_w/2
271. rois[:, 3] = pred_gt_center_y + pred_gt_h/2
272. print(rois.shape)
273. # 将得到的框映射到原图上,即限制超过边界的坐标
274. img_size = (800, 800)
275. rois[:, 0] = torch.clamp(rois[:, 0], 0, img_size[0])
276. rois[:, 1] = torch.clamp(rois[:, 1], 0, img_size[1])
277. rois[:, 2] = torch.clamp(rois[:, 2], 0, img_size[0])
278. rois[:, 3] = torch.clamp(rois[:, 3], 0, img_size[1])
279. print(rois)
280. # 去除高度或宽度小于minsize的预测框
281. w = rois[:, 2] - rois[:, 0]
282. h = rois[:, 3] - rois[:, 1]
283. keep = np.where((h.numpy() >= min_size) & (w.numpy() >= min_size))[0]
284. rois = rois[keep, :]
285. before_scores = objectness_score[0][keep]
286. before_scores_numpy = before_scores.data.numpy()
287. print(rois.shape)
288. print(before_scores.shape)
289. print(before_scores_numpy.shape)
290. print(before_scores_numpy.ravel().shape)
291. # 对before_scores按从高到低的顺序排序,取前n1个进行nms,然后再取前n2个送入ROI head中
292. order = np.argsort(before_scores_numpy)[::-1]
293. order = order[:n_train_pre_nms] # 12000
294. order = torch.from_numpy(order.copy())
295. rois = rois[order, :] # 12000*4
296. scores = before_scores[order] # 12000
297. rois_numpy = rois.data.numpy()
298. scores_numpy = scores.data.numpy()
299. keep = util.nms(rois_numpy, nms_thre, scores_numpy)
300. print(len(keep))
301. keep = keep[:n_train_post_nms]
302. rois = rois[keep, :]
303. print(rois.shape)
304. # 以上取出了要送入roi head进行预测的roi(RPN网络产生的预测框)
305.
306. '''''
307. 第四部分,对第三部分产生的rois进行进一步的采样,先对rpn预测后送进来的框进行定位,
308. 即计算每个框和每个gt的iou,根据iou对其进行采样,并进行位移坐标定位。
309. '''
310. n_sample = 128
311. pos_ratio = 0.25
312. pos_iou_thre = 0.5
313. neg_iou_thre_hi = 0.5
314. neg_iou_thre_lo = 0.0
315. '''''
316. 先采样,该部分根据输入到这里的rpn产生的roi,先计算这些roi实际的label和相对于gt的位移坐标
317. 用于与roi head生成的对比,计算loss
318. '''
319. # 计算iou
320. ious = util.iou(rois, bboxes) # 2000*2
321. print(ious)
322. print(ious.shape)
323. # 获取每个anchor对应的最大iou,及对应的gt
324. gt_argroi = ious.argmax(axis=1)
325. roi_max_ious = ious.max(axis=1)
326. gt_roi_label = labels[gt_argroi] # 对每个roi分配真实label
327. # 分配正样本
328. n_pos = n_sample*pos_ratio
329. pos_index = np.where(roi_max_ious > pos_iou_thre)[0]
330. pos_roi_this_image = int(min(n_pos, len(pos_index)))
331. if len(pos_index) > 0:
332. pos_index = np.random.choice(pos_index, size=pos_roi_this_image, replace=False)
333. print(pos_index)
334. print(len(pos_index))
335.
336. neg_roi_this_image = n_sample - pos_roi_this_image
337. neg_index = np.where((roi_max_ious < neg_iou_thre_hi) & (roi_max_ious > neg_iou_thre_lo))[0]
338. neg_roi_this_image = int(min(neg_roi_this_image, len(neg_index)))
339. if len(neg_index) > 0:
340. neg_index = np.random.choice(neg_index, size=neg_roi_this_image, replace=False)
341. print(neg_index)
342. print(len(neg_index))
343. # 以上采样出了正样本和负样本的索引,对这些roi求真实label和真实位移坐标作为gt_roi
344. keep_index = np.append(pos_index, neg_index)
345. print(keep_index)
346. sample_rois = rois[keep_index, :]
347. print(sample_rois.shape)
348. # 计算采样的rois的真实位移坐标和真实类别
349. gt_for_sample_rois = bboxes[gt_argroi[keep_index]] # 获取与sample_rois对应的gt框
350. w = sample_rois[:, 2] - sample_rois[:, 0]
351. h = sample_rois[:, 3] - sample_rois[:, 1]
352. center_x = sample_rois[:, 0] + w/2
353. center_y = sample_rois[:, 1] + h/2
354. gt_w = gt_for_sample_rois[:, 2] - gt_for_sample_rois[:, 0]
355. gt_h = gt_for_sample_rois[:, 3] - gt_for_sample_rois[:, 1]
356. gt_center_x = gt_for_sample_rois[:, 0] + w/2
357. gt_center_y = gt_for_sample_rois[:, 1] + h/2
358. eps = torch.tensor(1e-10)
359. h = np.maximum(h, eps)
360. w = np.maximum(w, eps)
361. dx = (gt_center_x - center_x)/w
362. dy = (gt_center_y - center_y)/h
363. dw = np.log(gt_w/w)
364. dh = np.log(gt_h/h)
365. gt_sample_roi_locations = np.vstack((dx, dy, dw, dh)).transpose()
366. gt_sample_roi_labels = gt_roi_label[keep_index]
367. gt_sample_roi_labels[pos_roi_this_image:] = 0 # 负样本的labels设置成0
368. '''''
369. gt_sample_roi_locations与gt_sample_roi_labels是roi部分的ground truth
370. '''
371. print(gt_sample_roi_locations)
372. print(gt_sample_roi_locations.shape)
373. print(gt_sample_roi_labels.shape)
374. print(sample_rois)
375. # 以上为处理结果,gt_sample_roi_locations和gt_sample_roi_labels为每个sample_roi对应的真实label和位移坐标
376. # sample_rois将被送入roi head来预测label和位移结果
377. print(sample_rois.shape)
378. roi_indexes = torch.zeros((sample_rois.shape[0]), dtype=torch.int32)
379. print(roi_indexes.shape)
380. # rois是用于输入roi head的数据,再sample_rois的基础上添加了一个img的索引,在本例中只有一个image
381.
382. rois = torch.zeros((sample_rois.shape[0], sample_rois.shape[1]+1))
383. rois[:, 0] = roi_indexes
384. rois[:, 1:] = sample_rois
385. print(rois.shape)
386. print(rois)
387. '''''
388. 此处处理逻辑是先把sample_rois加上一维,来表示是哪张图片的,因为实际中可能一次传入一个batch多张图片
389. 在本代码中只传入一张,所以该维全初始化为0,然后将sample_rois下采样16倍映射到对应的feature_map上
390. 然后传入roi pooling获得roi pooling处理后的结果,再传入roi head获得预测的结果
391. '''
392. size = 7
393. roi_pooling = nn.AdaptiveMaxPool2d(size, size)
394. out_put = [] # 用于存储roi pooling处理后的结果
395. # 下采样sub_sample倍,从原图映射到特征图上
396. rois[:, 1:].mul_(1.0/16.0)
397. print(feature_map.shape)
398. for i in range(rois.shape[0]):
399. roi = rois[i]
400. img_index = int(roi[0])
401. feature_im = feature_map[img_index, :, int(roi[1]):int(roi[3]), int(roi[2]):int(roi[4])] # 取出对应到feature map上的图
402. roi_pooling_im = roi_pooling(feature_im)
403. out_put.append(roi_pooling_im[0])
404. out_put = torch.stack(out_put)
405. print(out_put.shape)
406. # output中存储的就是sample_rois经过roi pooling处理后的特征图
407. out_put_linear = out_put.view(out_put.shape[0], -1) # 后面都是全连接层
408. print(out_put_linear.shape)
409. class ROIHead(nn.Module):
410. def __init__(self, num_class):
411. super(ROIHead, self).__init__()
412. num_class = num_class
413. self.linear1 = nn.Linear(25088, 4096)
414. self.linear2 = nn.Linear(4096, 4096)
415. # 输入的是每个rois映射到特征图再经过roi pooling的结果,预测每个roi中物体的类别和位移坐标
416. self.location = nn.Linear(4096, (num_class+1)*4) # 每个类别的位移坐标
417. self.score = nn.Linear(4096, (num_class+1)) # 每个类别的分数
418. self._init_weight()
419.
420. def _init_weight(self):
421. self.linear1.weight.data.normal_(0, 0.01)
422. self.linear1.bias.data.zero_()
423. self.linear2.weight.data.normal_(0, 0.01)
424. self.linear2.bias.data.zero_()
425. self.location.weight.data.normal_(0, 0.01)
426. self.location.bias.data.zero_()
427. self.score.weight.data.normal_(0, 0.01)
428. self.score.bias.data.zero_()
429.
430. def forward(self, x):
431. x = self.linear1(x)
432. x = self.linear2(x)
433. pred_roi_locations = self.location(x) # (num_class+1)*4
434. pred_roi_labels = self.score(x) # num_class+1
435. return pred_roi_locations, pred_roi_labels
436.
437. roihead = ROIHead(num_class=20)
438. print(out_put_linear.shape)
439. pred_roi_locations, pred_roi_labels = roihead(out_put_linear)
440. print(pred_roi_locations.shape) # (n_sample, (num_class+1)*4)
441. print(pred_roi_labels.shape) # (n_sample, (num_class+1))
442.
443. '''''
444. 第五部分,计算损失函数,本部分分两小部分,第一部分计算rpn的损失,第二部分计算roi的损失
445. '''
446. # rpn损失计算使用
447. loss_lambda = 10
448. print("RPN Loss")
449. print(anchor_locations.shape)
450. print(anchor_labels.shape)
451. print(pred_anchor_location.shape)
452. print(pred_anchor_cls.shape)
453. anchor_locations = torch.from_numpy(anchor_locations)
454. anchor_labels = torch.from_numpy(anchor_labels)
455. pred_anchor_location = pred_anchor_location[0]
456. pred_anchor_cls = pred_anchor_cls[0]
457. print(anchor_locations.shape, anchor_labels.shape, pred_anchor_location.shape, pred_anchor_cls.shape)
458. # 分类损失, 交叉熵损失
459. anchor_labels = anchor_labels.long()
460. rpn_cls_loss = F.cross_entropy(pred_anchor_cls, anchor_labels, ignore_index=-1)
461. print(rpn_cls_loss)
462. # 回归损失,smooth l1损失, 只对gt anchor labels为1的进行smooth l1损失计算
463. pos_index = anchor_labels > 0
464. print(pos_index.shape)
465. print(pos_index)
466. mask = pos_index.unsqueeze(1).expand_as(anchor_locations)
467. print(mask.shape)
468. print(mask)
469. # 取出label为正的anchor location计算损失
470. anchor_locations = anchor_locations[mask].view(-1, 4) # 18*4
471. pred_anchor_location = pred_anchor_location[mask].view(-1, 4) # 18*4
472. x = torch.abs(anchor_locations - pred_anchor_location)
473. print(x.shape)
474. rpn_loc_loss = (x < 1).float()*0.5*x**2 + (x >= 1).float()*(x-0.5)
475. rpn_loc_loss = rpn_loc_loss.sum() # 这是回归损失总和,要求平均
476. print(rpn_loc_loss)
477. n_reg = (anchor_labels>0).float().sum() # 总数
478. print(n_reg)
479. rpn_loc_loss = rpn_loc_loss/n_reg # 平均
480. print(rpn_loc_loss)
481. rpn_loss = rpn_cls_loss + loss_lambda*rpn_loc_loss
482. print("rpn loss:{}".format(rpn_loss))
483.
484. print("RPN Loss Finished")
485. print("-----------------------------------")
486. # 计算roi损失使用
487. print("-----------------------------------")
488. print("ROI Loss")
489. print(gt_sample_roi_locations.shape)
490. print(gt_sample_roi_labels.shape)
491. print(pred_roi_locations.shape)
492. print(pred_roi_labels.shape)
493. gt_sample_roi_locations = torch.from_numpy(gt_sample_roi_locations)
494. gt_sample_roi_labels = gt_sample_roi_labels.long()
495. # 分类损失
496. roi_cls_loss = F.cross_entropy(pred_roi_labels, gt_sample_roi_labels, ignore_index=-1)
497. print(roi_cls_loss)
498. # 回归损失
499. pred_roi_locations = pred_roi_locations.view(pred_roi_locations.shape[0], -1, 4)
500. print(pred_roi_locations.shape) # 128*21*4
501. # 取出pred_roi_locations与gt_roi_locations中对应的那一类的位移坐标进行计算
502. pred_roi_locations = pred_roi_locations[np.arange(0, pred_roi_locations.shape[0]), gt_sample_roi_labels] # 128*4
503. print(pred_roi_locations.shape)
504.
505. # 取出正标签,并计算其loss
506. pos_index = gt_sample_roi_labels > 0 # 正标签
507. mask = pos_index.unsqueeze(1).expand_as(pred_roi_locations) # 掩码
508. print(mask.shape)
509. pred_roi_locations = pred_roi_locations[mask].view(-1, 4) # 获取预测框中为正标签的部分
510. gt_sample_roi_locations = gt_sample_roi_locations[mask].view(-1, 4) # 同上,获取gt中的
511. print(pred_roi_locations.shape, gt_sample_roi_locations.shape)
512. x = torch.abs(pred_roi_locations - gt_sample_roi_locations)
513. roi_loc_loss = (x < 1).float()*0.5*x**2 + (x >= 1).float()*(x-0.5)
514. roi_loc_loss = roi_loc_loss.sum()
515. print(roi_loc_loss)
516. n_reg = (gt_sample_roi_labels > 0).sum()
517. roi_loc_loss = roi_loc_loss/n_reg
518. roi_loss = roi_cls_loss + loss_lambda*roi_loc_loss
519. print(roi_loc_loss)
520. print("roi_loss: {}".format(roi_loss))
521. print("ROI Loss Finished")
522. total_loss = rpn_loss+roi_loss
523. print("total loss: {}".format(total_loss))
524. total_loss.backward()
TODO
后续会开始看基于深度学习的边缘检测方法(应用于缺陷检测)
