Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net

阅读量：

动机

本文利用浅层网络实现了3D目标检测、追踪及预测任务。基于BEV的方法进行表征。推测该论文针对视觉感知领域中的深度学习模型进行了研究。输入：4D张量（X,Y,Z,T）输出：N张带预测的BEV图

备忘

This approach can lead to catastrophic consequences, as subsequent processes are unable to correct or recover from errors that emerge early in the pipeline. – In a hierarchical configuration, the downstream processing modules are unable to address errors originating from upstream multi-threaded components.

We believe this perspective is crucial as tracking and prediction capabilities can play a pivotal role in achieving object detection. Specifically, by integrating tracking and prediction data, we are able to effectively minimize false negatives in scenarios involving occluded or distant objects. Additionally, false positives can be mitigated through the systematic accumulation of evidence over time.
– 端到端的优势能够相互印证：检测、预测和跟踪

– 本文关于BBOX的预测方式借鉴了：
SSD: Single shot multibox detector

本文未采用三维卷积而采用了二维卷积的方式进行研究：Multi-view 3D object detection network for autonomous driving.

Voxel Representation

We then labeled a binary label for each voxel representing whether the voxel is occupied. We instead performed 2D convolutions and treated the height dimension as the channel dimension.

–[cc]比较经典的处理方式

Adding Temporal Information

We acquire each of the 3D points within the preceding n frames and execute a coordinate transformation to have them be represented in the current vehicle coordinate system.

[cc]按照当前帧对以前的帧进行坐标系变换。如何进行变换论文没讲！个人能想到的简单办法是根据当前车辆状态（6个维度）得到R/T，去反算前几帧

We are capable of appending multiple frame sequences within the newly introduced temporal dimension by creating a 4D tensor.

Model Formulation

四维输入张量，并直接回归到物体边界框，在不同时间戳上；不使用区域建议。

Early Fusion

[CC]基础网络还是一个VGG16，进行了裁剪.

We initially employ a one-dimensional convolution kernel of size n to process the temporal dimension, effectively reducing its original length of n to a single feature.

– 使用一个1D Conv将时间序列连接起来了

Late Fusion

but instead, the system applies 3D convolution using a kernel of size 3 × 3 × 3 across two layers while avoiding padding operations in the temporal dimension, thereby effectively reducing the temporal dimension from an initial value of n to a final value of 1.

Motion forecasting

Following Figure 5, we incorporate two distinct convolutional branches into our network architecture. The first branch is designed for binary classification, aiming to estimate the likelihood of an object being a vehicle. The second branch extends predictions beyond the immediate frame by forecasting bounding boxes for not only the current frame but also up to n−1 subsequent frames.

【cc】这里两个分支网络没有描述

Following SSD [17], based on our research. In all cases there exists six predetermined bounding boxes assigned to each spatial position within a given feature map. These positions are mathematically represented by $a[k,i,j]$ , where $i$ denotes the row index (ranging from $1$ to $I$ ), $j$ represents the column index (ranging from $1$ to $J$ ), and $k$ corresponds to one of $K$ predefined box configurations.

[CC]早期YOLO浓浓的既视感

Notice that we do not use predefined heading angles

[CC]BBOX的朝向是作为预测值进行回归的

for each predefined box a[k,i,j], our network predicts the corresponding normalized location offset l_x,l_y, log-normalized sizes s_w,s_h and heading parameters a_sin and a_cos.

When there is overlap between detected objects from the current frame and past forecasts, these instances are regarded as belonging to the same object, with their bounding boxes being averaged straightforwardly.

Loss Function and Training

where t is the current frame and w represents the model parameters.

We utilize $H$ for classification purposes, representing the binary cross-entropy loss calculated across all locations and predefined boxes.

in this context, i, j, k represent indices corresponding to feature map positions and their association with predefined boxes. Specifically, q[i,j,k] denotes the class assignment (namely, q[i,j,k]=1 signifies a vehicle while q[i,j,k]=0 indicates background). Conversely, p[i,j,k] represents the predicted probability of a vehicle being present at that location.

【CC】分类损失函数是一个二值交叉熵，用来衡量BBOX分类是否正确

Thus we define the regression targets as

We utilize a weighted average of the smooth L1 losses for all regression targets where the smooth L1 loss is defined as: $...$

For each predicted box, we determine the ground-truth box with the highest IoU overlap, which serves as ¯ a[k,i,j] when its calculatedIoU exceeds a preset threshold (typically set at 0.4). We then assign q[i,j,k] = 1 to signify this assignment.

全部评论 (0)

还没有任何评论哟~

Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net

动机本文通过一个不深的网络搞定了3D的目标检测/跟踪/预测。采用BEV的方式进行表达。猜测本文是MP3论文关于Perception部分的原型。输入：4D张量（X,Y,Z,T）输出：N张带预测的...

Sparse4D v3: Advancing End-to-End 3D Detection and Tracking

Sparse4Dv3:AdvancingEndtoEnd3DDetectionandTracking 相关内容：总览，Sparse4Dv1，Sparse4Dv2，单位：地平线Sparse4Dv1v2...

YOLOv10: Real-Time End-to-End Object Detection

paper连接：<https://arxiv.org/pdf/2405.14458 文章目录亮点 Yolo8是去年才用的，实在是懒得再看yolo10了，好吧，最近做了一个yolo8的小demo,发现...

YOLOv10: Real-Time End-to-End Object Detection

论文地址：:<http://arxiv.org/abs/2405.14458 代码地址:<https://github.com/THUMIG/yolov10 废话不多说，先上图：从图中可以看出效果确...

YoloV10 论文翻译（Real-Time End-to-End Object Detection）

摘要近年来，YOLO因其在计算成本与检测性能之间实现了有效平衡，已成为实时目标检测领域的主流范式。研究人员对YOLO的架构设计、优化目标、数据增强策略等方面进行了探索，并取得了显著进展。然而，YO...

【论文阅读】 YOLOv10: Real-Time End-to-End Object Detection

文章目录 Abstract Introduction RelatedWork Methodology ConsistentDualAssignmentsforNMSfreeTraining（无NMS训...

Global Correlation Network: End-to-End Joint Multi-Object Detection and Tracking

GlobalCorrelationNetwork:EndtoEndJointMultiObjectDetectionandTracking 清华和北航的一篇多目标跟踪的论文，发表于ICCV2021，P...

深度学习论文: YOLOv10: Real-Time End-to-End Object Detection

深度学习论文:YOLOv10:RealTimeEndtoEndObjectDetection YOLOv10:RealTimeEndtoEndObjectDetection PDF:<https://...

【论文阅读笔记】YOLOv10: Real-Time End-to-End Object Detection

论文地址：<https://arxiv.org/abs/2405.14458 文章目录论文小结论文简介论文方法为NMSfree训练的一致性双标签分配双标签分配一致性匹配度量效率精度整体驱...

End-to-End Object Detection with Fully Convolutional Network论文翻译

EndtoEndObjectDetectionwithFullyConvolutionalNetwork论文翻译摘要 1.介绍 2.相关工作 2.1全卷积目标检测器 2.2端到端的目标检测 3.方法...

是否确定退出登录?

Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net

动机

备忘

Voxel Representation

Adding Temporal Information

Model Formulation

Motion forecasting

Loss Function and Training

全部评论 (0)

相关文章推荐

Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net

Sparse4D v3: Advancing End-to-End 3D Detection and Tracking

YOLOv10: Real-Time End-to-End Object Detection

YOLOv10: Real-Time End-to-End Object Detection

YoloV10 论文翻译（Real-Time End-to-End Object Detection）

【论文阅读】 YOLOv10: Real-Time End-to-End Object Detection

Global Correlation Network: End-to-End Joint Multi-Object Detection and Tracking

深度学习论文: YOLOv10: Real-Time End-to-End Object Detection

【论文阅读笔记】YOLOv10: Real-Time End-to-End Object Detection

End-to-End Object Detection with Fully Convolutional Network论文翻译