Geometry-guided Kernel Transformer ——GKT代码复现
1,下载代码:https://github.com/hustvl/GKT
git clone https://github.com/hustvl/GKT
AI写代码
2,创建环境:
conda create -n GKT python=3.8 -y
AI写代码
3,安装依赖:
cd segmentation
pip install -r requirements.txt#作者这里写错了
python setup.py develop
AI写代码
在当前环境中执行最后一条命令时遇到了错误信息。随后发现问题属于权限配置不当,并要求将Anaconda软件的所有用户的访问权限进行设置。
4,数据组织
尝试可以下载nuscenes-mini-v1.0,也可以下载所有trainval-1.0的关键帧
参见:
该模型架构采用了图卷积网络作为基础组件,并结合空间注意力机制以提升特征提取能力。
在模型训练过程中采用交叉熵损失函数与Dice损失函数相结合的方法进行优化。
该模块主要包含三个关键组件:特征提取模块、注意力机制模块以及分类预测模块。
实验结果表明该方法在图像分割任务中表现优异。
下载作者的标签
连接https://www.cs.utexas.edu/~bzhou/cvt/cvt_labels_nuscenes.tar.gz
解压相关标签和nuscenes数据
tar -xvf /path/to/downloads/cvt_labels_nuscenes.tar.gz -C /media/datasets
mkdir /media/datasets/nuscenes/
# Untar all the keyframes and metadata
for f in $(ls /path/to/downloads/v1.0-*.tgz); do tar -xzf $f -C /media/datasets/nuscenes; done
# Map expansion must go into the maps folder
unzip /path/to/downloads/nuScenes-map-expansion-v1.3.zip -d /media/datasets/nuscenes/maps
AI写代码
构建完毕如下
/media/datasets/
├─ nuscenes/
│ ├─ v1.0-trainval/
│ ├─ v1.0-mini/
│ ├─ samples/
│ ├─ sweeps/
│ └─ maps/
│ ├─ basemap/
│ └─ expansion/
└─ cvt_labels_nuscenes/
├─ scene-0001/
├─ scene-0001.json
├─ ...
├─ scene-1000/
└─ scene-1000.json
AI写代码
5,标签生成
安装依赖库
pip install nuscenes-devkit==1.1.7
AI写代码
生成标签
全数据集合,原作者写错了一个函数,generate_data.py
# 可视化
python3 scripts/generate_data.py
data=nuscenes \
data.version=v1.0-trainval \
data.dataset_dir=/media/datasets/nuscenes \
data.labels_dir=/media/datasets/cvt_labels_nuscenes \
visualization=nuscenes_viz
# 不可视化
python3 scripts/generate_data.py \
data=nuscenes \
data.version=v1.0-trainval \
data.dataset_dir=/media/datasets/nuscenes \
data.labels_dir=/media/datasets/cvt_labels_nuscenes \
AI写代码
如果使用mini数据集那么就将里面的v11.0-trainval换成v1.0-mini,即可。
6,可以下载其预训练权重,谷歌网盘,需要梯子下载。
https://drive.google.com/file/d/1WyVwxykkh3jlSW8HiT3NKtJjISBsUaiq/view?usp=sharing
mkdir pretrained_models
cd pretrained_models
# 将预先训练权重放这里
AI写代码
7,训练,测试,评估
#训练
python scripts/train.py +experiment=gkt_nuscenes_vehicle_kernel_7x1.yaml data.dataset_dir=<path/to/nuScenes> data.labels_dir=<path/to/labels>
#测试
python scripts/eval.py +experiment=gkt_nuscenes_vehicle_kernel_7x1.yaml data.dataset_dir=<path/to/nuScenes> data.labels_dir=<path/to/labels> experiment.ckptt <path/to/checkpoint>
#评估
python scripts/speed.py +experiment=gkt_nuscenes_vehicle_kernel_7x1.yaml data.dataset_dir=<path/to/nuScenes> data.labels_dir=<path/to/labels>
AI写代码
8,训练:
下面是我的代码:
python scripts/train.py +experiment=gkt_nuscenes_vehicle_kernel_7x1.yaml data.dataset_dir=/home/wxq/GKT/media/datasets/nuscenes data.labels_dir=/home/wxq/GKT/media/datasets/cvt_labels_nuscenes
AI写代码
过程:
Global seed set to 2022
Loaded pretrained weights for efficientnet-b4
[2023-02-02 21:16:45,214][torch.distributed.nn.jit.instantiator][INFO] - Created a temporary directory at /tmp/tmp_l4_325q
[2023-02-02 21:16:45,214][torch.distributed.nn.jit.instantiator][INFO] - Writing /tmp/tmp_l4_325q/_remote_module_non_sriptable.py
[2023-02-02 21:16:45,352][main][INFO] - Searching /home/wxq/GKT/segmentation/logs.
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Trainer(limit_train_batches=1.0)was configured so 100% of the batches per epoch will be used..
Trainer(limit_val_batches=1.0)was configured so 100% of the batches will be used..
Trainer(val_check_interval=1.0)was configured so validation will run at the end of the training epoch..
Global seed set to 2022
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[2023-02-02 21:16:45,412][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 0
[2023-02-02 21:16:45,412][torch.distributed.distributed_c10d][INFO] - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
---------------------------------------------------
0 | backbone | CrossViewTransformer | 1.2 M
1 | loss_func | MultipleLoss | 0
2 | metrics | MetricCollection | 0
---------------------------------------------------
1.2 M Trainable params
0 Non-trainable params
1.2 M Total params
4.701 Total estimated model params size (MB)
/home/wxq/GKT/segmentation/cross_view_transformer/tabular_logger.py:36: UserWarning: Experiment logs directory /home/wxq/GKT/segmentation/logs/lightning_logs/version_11 exists and is not empty. Previous log files in this directory will be deleted when the new ones are saved!
rank_zero_warn(
/home/wxq/anaconda3/envs/GKT/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:486: PossibleUserWarning: Yourval_dataloader's sampler has shuffling enabled, it is strongly recommended that you turn shuffling off for val/test/predict dataloaders.
rank_zero_warn(
[2023-02-02 21:16:49,767][cross_view_transformer.tabular_logger][INFO] - lr-AdamW:0.000400, step:0
/home/wxq/anaconda3/envs/GKT/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:2719: UserWarning: Using trainer.logger when Trainer is configured to use multiple loggers. This behavior will change in v1.8 when LoggerCollection is removed, and trainer.logger will return the first logger in trainer.loggers
rank_zero_warn(
/home/wxq/anaconda3/envs/GKT/lib/python3.8/site-packages/pytorch_lightning/utilities/warnings.py:44: LightningDeprecationWarning: pytorch_lightning.utilities.warnings.rank_zero_warn has been deprecated in v1.6 and will be removed in v1.8. Use the equivalent function from the pytorch_lightning.utilities.rank_zero module instead.
new_rank_zero_deprecation(
/home/wxq/anaconda3/envs/GKT/lib/python3.8/site-packages/pytorch_lightning/utilities/warnings.py:49: UserWarning: Invalid logger <pytorch_lightning.loggers.base.LoggerCollection object at 0x7f18907cccd0>
return new_rank_zero_warn(*args, **kwargs)
[2023-02-02 21:16:54,787][root][INFO] - Reducer buckets have been rebuilt in this iteration.
[2023-02-02 21:17:21,545][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.039249, train/loss/visible_step:0.032033, train/loss/center_step:0.072163, epoch:0, step:49
[2023-02-02 21:17:48,963][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.042063, train/loss/visible_step:0.038138, train/loss/center_step:0.039251, epoch:0, step:99
[2023-02-02 21:18:16,449][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.026050, train/loss/visible_step:0.023647, train/loss/center_step:0.024033, epoch:0, step:149
[2023-02-02 21:18:43,918][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.029235, train/loss/visible_step:0.027685, train/loss/center_step:0.015508, epoch:0, step:199
[2023-02-02 21:19:11,457][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.016289, train/loss/visible_step:0.015199, train/loss/center_step:0.010893, epoch:0, step:249
[2023-02-02 21:19:38,981][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.014259, train/loss/visible_step:0.013455, train/loss/center_step:0.008041, epoch:0, step:299
[2023-02-02 21:20:06,562][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.024729, train/loss/visible_step:0.024085, train/loss/center_step:0.006436, epoch:0, step:349
[2023-02-02 21:20:34,088][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.012128, train/loss/visible_step:0.011636, train/loss/center_step:0.004923, epoch:0, step:399
[2023-02-02 21:21:01,596][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.013290, train/loss/visible_step:0.012880, train/loss/center_step:0.004103, epoch:0, step:449
[2023-02-02 21:21:29,067][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.017175, train/loss/visible_step:0.016829, train/loss/center_step:0.003464, epoch:0, step:499
一直运行很多个epoch,不知为什么,中间报错了这个:
/home/wxq/anaconda3/envs/GKT/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:2719: UserWarning: Using trainer.logger when Trainer is configured to use multiple loggers. This behavior will change in v1.8 when LoggerCollection is removed, and trainer.logger will return the first logger in trainer.loggers
rank_zero_warn(
/home/wxq/anaconda3/envs/GKT/lib/python3.8/site-packages/pytorch_lightning/utilities/warnings.py:44: LightningDeprecationWarning: pytorch_lightning.utilities.warnings.rank_zero_warn has been deprecated in v1.6 and will be removed in v1.8. Use the equivalent function from the pytorch_lightning.utilities.rank_zero module instead.
new_rank_zero_deprecation(
/home/wxq/anaconda3/envs/GKT/lib/python3.8/site-packages/pytorch_lightning/utilities/warnings.py:49: UserWarning: Invalid logger <pytorch_lightning.loggers.base.LoggerCollection object at 0x7f1993a24f40>
return new_rank_zero_warn(*args, **kwargs)
[2023-02-03 02:27:02,722][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.007324, train/loss/visible_step:0.007275, train/loss/center_step:0.000486, epoch:4, step:29649
[2023-02-03 02:27:30,199][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.005525, train/loss/visible_step:0.005477, train/loss/center_step:0.000483, epoch:4, step:29699
[2023-02-03 02:27:57,726][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.008282, train/loss/visible_step:0.008228, train/loss/center_step:0.000543, epoch:4, step:29749
[2023-02-03 02:28:25,204][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.004319, train/loss/visible_step:0.004281, train/loss/center_step:0.000381, epoch:4, step:29799
[2023-02-03 02:28:52,707][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.003136, train/loss/visible_step:0.003106, train/loss/center_step:0.000293, epoch:4, step:29849
[2023-02-03 02:29:20,174][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.005958, train/loss/visible_step:0.005909, train/loss/center_step:0.000488, epoch:4, step:29899
[2023-02-03 02:29:47,677][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.007348, train/loss/visible_step:0.007289, train/loss/center_step:0.000595, epoch:4, step:29949
[2023-02-03 02:30:15,226][cross_view_transformer.tabular_logger][INFO] - train/loss_step:0.007833, train/loss/visible_step:0.007781, train/loss/center_step:0.000523, epoch:4, step:29999
[2023-02-03 02:30:15,883][cross_view_transformer.tabular_logger][INFO] - train/loss_epoch:0.008613, train/loss/visible_epoch:0.008551, train/loss/center_epoch:0.000621, epoch:5, step:30000
段错误 (核心已转储)
报错:段错误,不是代码本身的错,而是电脑问题。
9,评价
利用存储的检查点,在output里面,下面是我的代码:
python scripts/eval.py +experiment=gkt_nuscenes_vehicle_kernel_7x1.yaml data.dataset_dir=/home/wxq/GKT/media/datasets/nuscenes data.labels_dir=/home/wxq/GKT/media/datasets/cvt_labels_nuscenes experiment.ckptt=/home/wxq/GKT/segmentation/pretrained_models/model-v1.ckpt
AI写代码
结果:
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
验证指标DataLoader 0的状态
────────────────────────────────────
在训练集上评估IOU指标:@0.40阈值下的准确率为0.0;@0.50阈值下的准确率也为零
在包含遮挡情况的训练集上评估带遮挡的IOU指标:@0.40阈值下的准确率为36.3%,@0.5阈值下准确率为32.4%
在验证集上的平均损失为约1198微秒
验证集中心区域损失为约637纳秒
验证集可见区域损失约为1185微秒
在验证集中评估带遮挡的IOU指标:@0.4阈值下达到32.8%,@5阈值下降至26.7%
────────────
注意事项:内存泄漏或数据损坏(!prev)已触发(核心已转储)
最后也报错了已放弃,哈哈哈
10,速度测试
python scripts/speed.py +experiment=gkt_nuscenes_vehicle_kernel_7x1.yaml data.dataset_dir=/home/wxq/GKT/media/datasets/nuscenes data.labels_dir=/home/wxq/GKT/media/datasets/cvt_labels_nuscenes
AI写代码
结果:
Configure mixed-precision settings by setting +mixed_precision=True
Measure CPU performance with device set to CPU
Establish global seed value as 2022
Load pretrained weights for EfficientNet-B_4 model
Obtain inference latency and speed metrics: inference latency = 31.349 ms, speed = 31.898 fps
结束
