yolov5 DeepSort跟踪(5): Deepsort 多目标跟踪--行人/车辆ReID训练
项目中为行人ReID模型提供了权重参数,在yolov5-deepsort\deep_sort\deep_sort\deep\checkpoint目录中创建了以ckpt.t7命名的权重文件。这些权重基于DeepSort算法在行人ReID数据集上进行训练生成,并被用来提取行人的外观特征信息。
如果你有拥有 Marker-1501 数据集或其他优异 ReID 数据集的可能性的话, 可以通过重新训练获得. 本文将介绍在 Marker-1501 数据集下进行行人 ReID 训练
Market 1501 的下载链接:https://pan.baidu.com/s/1dlG2CX6ZQKCW8cp9uwptEw?pwd=rwzn
密码:rwzn
1. 行人ReID训练
1.1 Market 1501数据集
Market-1501 数据集是在清华大学校园中采集于2015年构建并公开。它包括由6个摄像头(其中5个高清摄像头和1个低分辨率摄像头)拍摄到的1501个行人、32668个检测到的行人矩形框。每个行人至少有2个摄像头捕捉到,并且在一个摄像头中可能具有多张图像。训练集有751人,包含12936张图像,平均每个人有17.2张训练数据;测试集有750人,包含19732张图像,平均每个人有26.3张测试数据。
1.1.1 数据集目录结构

包含五个文件夹
- bounding_box_test : 测试集图片

- bounding_box_train : 训练集图片

- query : 包含750个不同身份的样本。每个摄像头随机抽取一张作为查询图像
- gt_query : 包含真实标注信息
- gt_bbox : 人工绘制的边界框用于评估深度学习边界框检测器DPM的效果质量
图片命名规则
以 0001_c1s1_000151_01.jpg为例
**标识编号**:在本系统中, **覆盖范围**是从 **数字**- 摄像头:系统配置了 第一至第六台摄像头(分别标记为 c₁ 至 c₆)
- 片段序列:每个 摄像头 配备了多个 片段序列(分别标记为 s₁ 至 sₙ)
- 帧图像:在 c₁-s₁ 系列中 的每一帧图像编号由固定格式唯一标识
- 检测框:在某一特定帧图像(例如编号 ```
- 标注信息:由于采用了先进的 DPM 检测算法,
每一张图像可能会出现多个人像检测结果
相应的标注框也会相应增加
如果无人被检测到,
则用预设的标注框替代
1.2 整理行人ReID数据集
在特定的目录结构中,在yolov5-deepsort/deep_sort/deep_sort/deep位置运行Python脚本prepare_person.py,并同时将Market-1501-v15.09.15数据集复制到该目录中。
python prepare_person.py
注:该脚本参考https://github.com/layumi/Person_reID_baseline_pytorch/blob/master/prepare.py
运行该脚本后,会创建一个pytorch文件,其中包含在内的是train和test文件。这些文件涉及的是同一人的图片,并以该人的编号命名。


在特定路径下创建Market-1501目录,并同时将训练与测试文件复制到该目录中。
原因:torch.datasets包中的ImageFolder能够方便地从硬盘加载图像数据
1.3 训练行人ReID网络
在yolov5-deepsort\deep_sort\deep_sort\deep路径下执行:
python train.py --data-dir ./Market-1501
在项目目录yolov5-deepsort\deep_sort\deep_sort\deep下的train.py
import argparse
import os
import time
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.backends.cudnn as cudnn
import torchvision
from model import Net
parser = argparse.ArgumentParser(description="Train on market1501")
parser.add_argument("--data-dir",default='/hy-tmp/yolov5-deepsort/deep_sort/deep_sort/deep/Market-1501',type=str)
parser.add_argument("--no-cuda",action="store_true")
parser.add_argument("--gpu-id",default=0,type=int)
parser.add_argument("--lr",default=0.1, type=float)
parser.add_argument("--interval",'-i',default=20,type=int)
parser.add_argument('--resume', '-r',action='store_true')
args = parser.parse_args()
# device
device = "cuda:{}".format(args.gpu_id) if torch.cuda.is_available() and not args.no_cuda else "cpu"
if torch.cuda.is_available() and not args.no_cuda:
cudnn.benchmark = True
# data loading
root = args.data_dir
train_dir = os.path.join(root,"train")
test_dir = os.path.join(root,"test")
transform_train = torchvision.transforms.Compose([
torchvision.transforms.RandomCrop((128,64),padding=4),
torchvision.transforms.RandomHorizontalFlip(),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
transform_test = torchvision.transforms.Compose([
torchvision.transforms.Resize((128,64)),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
trainloader = torch.utils.data.DataLoader(
torchvision.datasets.ImageFolder(train_dir, transform=transform_train),
batch_size=64,shuffle=True
)
testloader = torch.utils.data.DataLoader(
torchvision.datasets.ImageFolder(test_dir, transform=transform_test),
batch_size=64,shuffle=True
)
num_classes = max(len(trainloader.dataset.classes), len(testloader.dataset.classes))
print("num_classes = %s" %num_classes)
# net definition
start_epoch = 0
net = Net(num_classes=num_classes)
if args.resume:
assert os.path.isfile("./checkpoint/ckpt.t7"), "Error: no checkpoint file found!"
print('Loading from checkpoint/ckpt.t7')
checkpoint = torch.load("./checkpoint/ckpt.t7")
# import ipdb; ipdb.set_trace()
net_dict = checkpoint['net_dict']
net.load_state_dict(net_dict)
best_acc = checkpoint['acc']
start_epoch = checkpoint['epoch']
net.to(device)
# loss and optimizer
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), args.lr, momentum=0.9, weight_decay=5e-4)
best_acc = 0.
# train function for each epoch
def train(epoch):
print("\nEpoch : %d"%(epoch+1))
net.train()
training_loss = 0.
train_loss = 0.
correct = 0
total = 0
interval = args.interval
start = time.time()
for idx, (inputs, labels) in enumerate(trainloader):
# forward
inputs,labels = inputs.to(device),labels.to(device)
outputs = net(inputs)
loss = criterion(outputs, labels)
# backward
optimizer.zero_grad()
loss.backward()
optimizer.step()
# accumurating
training_loss += loss.item()
train_loss += loss.item()
correct += outputs.max(dim=1)[1].eq(labels).sum().item()
total += labels.size(0)
# print
if (idx+1)%interval == 0:
end = time.time()
print("[progress:{:.1f}%]time:{:.2f}s Loss:{:.5f} Correct:{}/{} Acc:{:.3f}%".format(
100.*(idx+1)/len(trainloader), end-start, training_loss/interval, correct, total, 100.*correct/total
))
training_loss = 0.
start = time.time()
return train_loss/len(trainloader), 1.- correct/total
def test(epoch):
global best_acc
net.eval()
test_loss = 0.
correct = 0
total = 0
start = time.time()
with torch.no_grad():
for idx, (inputs, labels) in enumerate(testloader):
inputs, labels = inputs.to(device), labels.to(device)
outputs = net(inputs)
loss = criterion(outputs, labels)
test_loss += loss.item()
correct += outputs.max(dim=1)[1].eq(labels).sum().item()
total += labels.size(0)
print("Testing ...")
end = time.time()
print("[progress:{:.1f}%]time:{:.2f}s Loss:{:.5f} Correct:{}/{} Acc:{:.3f}%".format(
100.*(idx+1)/len(testloader), end-start, test_loss/len(testloader), correct, total, 100.*correct/total
))
# saving checkpoint
acc = 100.*correct/total
if acc > best_acc:
best_acc = acc
print("Saving parameters to checkpoint/ckpt.t7")
checkpoint = {
'net_dict':net.state_dict(),
'acc':acc,
'epoch':epoch,
}
if not os.path.isdir('checkpoint'):
os.mkdir('checkpoint')
torch.save(checkpoint, './checkpoint/ckpt.t7')
return test_loss/len(testloader), 1.- correct/total
# plot figure
x_epoch = []
record = {'train_loss':[], 'train_err':[], 'test_loss':[], 'test_err':[]}
fig = plt.figure()
ax0 = fig.add_subplot(121, title="loss")
ax1 = fig.add_subplot(122, title="top1err")
def draw_curve(epoch, train_loss, train_err, test_loss, test_err):
global record
record['train_loss'].append(train_loss)
record['train_err'].append(train_err)
record['test_loss'].append(test_loss)
record['test_err'].append(test_err)
x_epoch.append(epoch)
ax0.plot(x_epoch, record['train_loss'], 'bo-', label='train')
ax0.plot(x_epoch, record['test_loss'], 'ro-', label='val')
ax1.plot(x_epoch, record['train_err'], 'bo-', label='train')
ax1.plot(x_epoch, record['test_err'], 'ro-', label='val')
if epoch == 0:
ax0.legend()
ax1.legend()
fig.savefig("train.jpg")
# lr decay
def lr_decay():
global optimizer
for params in optimizer.param_groups:
params['lr'] *= 0.1
lr = params['lr']
print("Learning rate adjusted to {}".format(lr))
def main():
total_epoches = 40
for epoch in range(start_epoch, start_epoch+total_epoches):
train_loss, train_err = train(epoch)
test_loss, test_err = test(epoch)
draw_curve(epoch, train_loss, train_err, test_loss, test_err)
if (epoch+1)%(total_epoches//2)==0:
lr_decay()
if __name__ == '__main__':
main()

完成40轮训练后,在训练集上达到97.205%的准确率,在测试集上达到82.157%。如若无法满足要求,则需继续增设计算轮次。
由751名个人组成的该数据集运用了751个类别来训练分类模型,并将性能最佳的模型参数进行了存储。
导入已有的权重参数到该模型中,并通过调用Net(reid=True)函数可获得行人ReID对应的256维特征向量,请参考下方介绍的行人ReID网络部分。
2. 行人ReID网络
在路径yolov5-deepsort\deep_sort\deep_sort\deep下的model.py
import torch
import torch.nn as nn
import torch.nn.functional as F
class BasicBlock(nn.Module):
def __init__(self, c_in, c_out,is_downsample=False):
super(BasicBlock,self).__init__()
self.is_downsample = is_downsample
if is_downsample:
self.conv1 = nn.Conv2d(c_in, c_out, 3, stride=2, padding=1, bias=False)
else:
self.conv1 = nn.Conv2d(c_in, c_out, 3, stride=1, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(c_out)
self.relu = nn.ReLU(True)
self.conv2 = nn.Conv2d(c_out,c_out,3,stride=1,padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(c_out)
if is_downsample:
self.downsample = nn.Sequential(
nn.Conv2d(c_in, c_out, 1, stride=2, bias=False),
nn.BatchNorm2d(c_out)
)
elif c_in != c_out:
self.downsample = nn.Sequential(
nn.Conv2d(c_in, c_out, 1, stride=1, bias=False),
nn.BatchNorm2d(c_out)
)
self.is_downsample = True
def forward(self,x):
y = self.conv1(x)
y = self.bn1(y)
y = self.relu(y)
y = self.conv2(y)
y = self.bn2(y)
if self.is_downsample:
x = self.downsample(x)
return F.relu(x.add(y),True)
def make_layers(c_in,c_out,repeat_times, is_downsample=False):
blocks = []
for i in range(repeat_times):
if i ==0:
blocks += [BasicBlock(c_in,c_out, is_downsample=is_downsample),]
else:
blocks += [BasicBlock(c_out,c_out),]
return nn.Sequential(*blocks)
class Net(nn.Module):
def __init__(self, num_classes=751, reid=False):
super(Net,self).__init__()
# 3 128 64
self.conv = nn.Sequential(
nn.Conv2d(3,64,3,stride=1,padding=1),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True),
# nn.Conv2d(32,32,3,stride=1,padding=1),
# nn.BatchNorm2d(32),
# nn.ReLU(inplace=True),
nn.MaxPool2d(3,2,padding=1),
)
# 32 64 32
self.layer1 = make_layers(64,64,2,False)
# 32 64 32
self.layer2 = make_layers(64,128,2,True)
# 64 32 16
self.layer3 = make_layers(128,256,2,True)
# 128 16 8
self.layer4 = make_layers(256,512,2,True)
# 256 8 4
self.avgpool = nn.AvgPool2d((8,4),1)
# 256 1 1
self.reid = reid
self.classifier = nn.Sequential(
nn.Linear(512, 256),
nn.BatchNorm1d(256),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(256, num_classes),
)
def forward(self, x):
x = self.conv(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = x.view(x.size(0),-1)
# B x 256
if self.reid:
x = x.div(x.norm(p=2,dim=1,keepdim=True))
return x
# classifier
x = self.classifier(x)
return x
从实验结果来看,在网络结构上与ResNet-34具有较高的相似性。
其主要区别在于:conv2_x, conv3_x, conv3_x, conv4_x模块堆叠所使用的BasicBlock数量分别为[2, 2, 2, 2];而ResNet-34则采用了[3, 4, 6, 3]的数量配置。
如果需要输出ReID的结果,则需对输入的高维向量进行归一化处理,并最终提取行人的特征向量。
3 训练车辆ReID数据集
3.1 下载车辆ReID数据集
通过百度网盘获取car-dataset.zip文件,并将该文件解码至yolov5-deepsort\deep_sort\deep_sort\deep指定位置
- 数据集来源:<>
共 576 个⻋辆的图片

3.2 整理车辆ReID数据集
在yolov5-deepsort\deep_sort\deep_sort\deep路径下执行
python prepare_car.py
经处理后的car-reid-dataset目录中包含train与test目录分别对应于训练集合测试集合中的图像数据。其中每个子目录中均记录着同一辆汽车的不同拍摄视角下的图像样本。具体而言:
图片文件名如下:
[0:4位数字]:车辆ID
[4-6位数字]:相机ID
[6-11位数字]:跟踪ID
[最后四位数字]:图像编号
3.3 修改model.py
修改yolov5-deepsort\deep_sort\deep_sort\deep下的model.py文件,
class Net(nn.Module):
def __init__(self, num_classes=751 ,reid=False):
改为
class Net(nn.Module):
def __init__(self, num_classes=576 ,reid=False):
3.4 训练车辆ReID网络
删除checkpoint文件夹下的ckpt.t7文件,然后
在deep路径下执行:
python train.py --data-dir ./car-reid-dataset
3.5 效果演示
基于YOLoV5-DeepSort实现的多目标车辆数量统计展示
python count_car.py
代码下载
百度网盘下载地址 : 链接:https://pan.baidu.com/s/1XyQcMbiOA2nsRfwT3j3P8w 密码:8khk
