多模态数据集 论文阅读 | WRDI: A Multimodal Dataset of mmWave Radar Data and Image, 2021 IEEE Big Data
在本研究中涉及的论文阅读:其中一项文献为Zhang Wei、Jin Cong、Sun Lin和Li Yong共同编写的一篇名为《WRDI:毫米波雷达数据与图像的多模态数据集》的研究文章,在IEEE国际大数据大会(Big Data)上发表于2021年,并收入会刊第3209至3214页;该数据集(WRDI)结合了毫米波雷达数据和图像的多模态特性,并通过先进的数据分析方法实现了高效的信号处理与智能识别功能;该研究通过构建完整的毫米波雷达图像数据库并引入深度学习算法,在提高目标检测精度的同时显著降低了计算复杂度;本文通过设计高效的特征提取方法与优化算法框架,在提升系统性能方面取得了突破性进展

Abstract
毫米波雷达与图像数据融合技术在security monitoring领域具有重要意义。该技术具有良好的实时处理能力和高灵敏度检测能力。全天候全天候运行。能够有效地识别并排除非真实物体
挑战
本文主要贡献是开发出了一个新的数据集系统
1 Introduction
安防( intruder detection )领域:mmWave Radar and Camera are capable of complementing each other in the domain of intruder detection.
Camera 主要终端设备
- 优点:能够捕捉物体的精细纹理和丰富的色彩信息
- 缺点:在恶劣条件下难以充分获取所需的信息
毫米波雷达 * 优点:具有广域扫描范围并具备强大的障碍物穿透能力,在恶劣天气及不同光照条件下表现优异;* 缺点:信息贫乏且场景细节不足
The millimeter-wave radar and optical sensors' data are integrated into the security monitoring system to enhance performance.
数据融合 (camera + mmWave radar) 研究现状
现有研究
在大部分情况下,在城市道路收集的雷达和图像多模态数据主要应用于自动驾驶而非安全监控。例如数据噪声较高以及数据缺失的情况导致了低质量的问题
有必要 build a high-grade, diverse, and comprehensive mmWave radar and image multimodal dataset
本文贡献
- WRDI : mmWave雷达数据与图像多模态数据集 containing 5,000 frames
-
室内与室外
-
夜、雨和雾天
-
意义 :the first multimodal dataset for security monitoring field.
-
2 Related Work
This section:
*单一模态数据集, *多元模态数据集, *分析优缺点
2.1 Image Datasets
The features lack three-dimensional data. The system's resistance to daytime and nighttime variations is limited. No capability can filter fake objects, such as those found in billboards.
- Examples * ImageNet
- COCO
2.2 Radar Datasets
Features (mMWave radar)
* wide sensing range
* 全天候全天时
* 分辨率低
Examples * MSTAR: DARPA发布的SAR数据集
2.3 Radar and Image Multimodal Dataset
Features * 采用多源感知技术(相机与雷达)进行数据采集
- ❌ 该方案部署于城市交通场景中 \Rightarrow 感知域的多样性严重不足
Examples * Neuscenes: 相机 + LiDAR + Radar, 1000+ scenes, notations
* Oxford dataset
相机, mmWave radar二者互补
3 mmWave Radar and Image Multimodal Dataset
部署配置与设备设置
数据采集流程
数据处理后的方法
数据标注与统计分析
3.1 Equipment and Deployment
-
Radar * AWR 2243
- 3TX, 4RX, 77-81GHz
- DCA1000EVM-BOOST
-
Camera * HIKVISION 2SC3Q140MYTE
- 1280×720
- RTSP real-time streaming protocol
-
Terminal device * 执行相机数据处理任务: Jetson Nano B01 (频率:1.43GHz, 内存:4GB),Ubuntu 18.04 LTS
-
执行雷达数据处理任务:LattePanda Alpha (频率:2.6GHz, 内存:8GB), Windows 10
-
Data Handling * TI Studio: process raw ADC data.
-
Python: perform format conversion on the data, covering the entire data range. Additionally, implement Doppler and angle calculations using Fast Fourier Transform (FFT).
- Settings :
- 如下图
- Settings :

3.2 Collection Process
3.2.1 Sensor Calibration
-
内参: * calibra-tion of intrinsic camera parameters and radial distortion coefficients is required. * The recommended approach is to employ the Zhang's camera calibration framework[12].
- 外参: * select radar as the origin of the local coordinate system
3.2.2 Collection Scene
-
Light conditions: * daytime + nighttime
-
室内的场景:在一间会议室中,具有wide field of view的同时靠近房间中心位置,在尺寸上为with a width of 6 meters和a length of 9 meters。
-
在一条走廊中,则具有narrow field of view同时延伸过长的距离,在宽度上为With a width of 2.5 meters和an overall length of fifty meters。
-
Outdoor areas * campus playground : numerous walkers alongside a variety of sports gear, covering an area of 180 meters by 92 meters.
Road surfaces : display intricate visual layouts with diverse pedestrian and vehicle mixes, each section spanning 8 meters in width.
3.2.3 Collection Object
- 两个集合:People和Vehicles
-
人们 * 站立的人群 (people)
- 缓慢行走的人群 (slow walking people)
- 快速行走的人群(速度<2m/s)
-
Vehicles (cars) * Stationary cars
- Slow-mobving cars (<6m/s)
- Fast-moving cars
-
Note:
*
false objects: 例如海报中的人物形象、画布上的画像等
3.2.4 Collection settings
-
Frame rate :
- Video: 15fps
- Radar: 16fps
-
Hard and Easy datasets:
-
Hard: if (distance exceeds 25 meters) or (exhibits high speed) or (window/head is blocked)
-
Easy: otherwise
3.3. Data Processing
- step 1: Extract camera and radar frames
- step 2: remove sensitive information such as license plates from camera frames
- step 3: align frames temporally and spatially
- step 4: transform raw radar data into a 3D radar heatmap (referred to as a radar cube)
数据格式 (如下图) * groundtruth.npz: 包含labels of the dataset

3.4 Data Annotations
- 标记了两类:行人和车辆
- 对每个物体的标注:
- 其边界框使用坐标 (x, y, h, w) 表示。
- 通过一个独热向量表示类别。
- 通过雷达数据测量得到。
- 物体的方位和速度通过雷达数据测量得到(在空间上对齐)。
- 对每个物体的标注:
3.5 Statistical Information
-
Object Categories:
-
Thirty individuals engaged in pedestrian activities
-
Four categories or types of motor vehicles
-
Five types or categories of false target information
-
The cumulative recording duration for each subject is at least twenty minutes.
- Train and Test data * 共 5000 frames
- 7:3 \Rightarrow Train: 3500 frames, Test: 1500 frames
- Train and Test data * 共 5000 frames
-
Compare to other datasets, such as Nuscenes and KITTI, which are primarily applied to autonomous driving rather than security monitoring.
-
When compared to RadarScenes, the data distribution in this dataset is uneven.
4 Evaluation
WRDI targets learning and assessing both detection tasks and motion analysis.
下面一段将详细阐述基于WRDI的物体检测与跟踪的评估指标。
4.1 Object Detection
-
Mean Average Precision (mAP) * AP@50, AP@75, AP@95 * reflecting the algorithm model's detection capability under varying localization challenges
-
AP@50:5:95: 计算IoU值从[公式]到[公式]之间每隔[公式]个区间的mAP,并计算所有mAP的平均值
-
模型的总体性能通过不同IoU限制进行评估。
4.2 Object Tracking
MOTA : Multi-object tracking accuracy
M O T A=1-\frac{\sum F N+F P+I D S W}{\sum G T}
还有 MOTP, IDP, IDR, IDF1等
5 Conclusion
- 一个用于 security monitoring 的 camera + mmWave multimodel dataset
