Advertisement

MP3- A Unified Model to Map Perceive Predict and Plan

阅读量:

简介

动机:由于HD map更新不够及时,在定位过程中出现的问题也使得HD数据不可靠。因此提出了一种无需依赖传统 HD map 的 LIDAR 基础方案 输入:原始 LIDAR 信息(点云时间序列)+ Routing 指令(例如转向右方) 特点:具有良好的可解释性——通过引入可解码层实现了模型行为的一一对应关系

主干结构

在这里插入图片描述

The LiDAR data has been discretized into voxels of 0.2 meters per pixel, resulting in a resolution of 0.8 meters per pixel for variable C.

概念介绍

**可行驶的道路表面(如路面或人行道)**是指被道路边缘线包围的地方,在这里车辆能够通行。

Intersection: 可管理的驾驶区域(即通过交通信号灯或交通标志来控制的道路区域)。深入分析这一点对于处理Stop/ Yield标志和交通灯至关重要。

Reachable lanes: Central lines (also known as motion paths) represent the standard routes vehicles follow, typically positioned midway between two lane markings. The reachable lanes are designated as the set of motion paths accessible by the SDV while adhering to all traffic regulations. During trajectory planning, it's optimal for the SDV to maintain proximity to these reachable lanes and align its driving direction with their orientation. These characteristics ensure that our system maintains safe and efficient vehicle movements within predefined boundaries.

在这里插入图片描述

Initial Occupancy: A BEV grid cell is marked as occupied if the center lies within the boundaries defined by an object's shape and its current orientation.

Temporal Motion Field: defined as the set of occupied pixel locations at a specific future time interval. Each pixel's motion is depicted by a two-dimensional bird's-eye view velocity vector, measured in meters per second. The motion field is divided into T intervals of equal duration, up to a maximum of five seconds, incrementing every half-second.

在这里插入图片描述

within this paper, we introduce an occupancy-flow system based on the occupancy states of dynamic objects in their current configurations and incorporating a temporal motion field to predict how these objects will evolve over time. Both aspects are discretized onto a spatial grid over Bird's-Eye View with an resolutionscaleof 0.4 meters per pixel, as shown in Figure 4.

在这里插入图片描述

Backbone network:

Extracts spatial and semantic features across prior LiDAR sweeps. Within each residual stage, scene contextual features are represented as C1x, C2x, C4x, and C8x. Here, the subscript notation represents the downsampling factor relative to the input.

在这里插入图片描述

Since the LiDAR input is discretized into 0.2-meter voxels, C achieves a spatial resolution of 0.8 meters per pixel.

Mapping architecture

在这里插入图片描述

We consider modeling...MD_i as a Laplacian, which we found that this approach outperforms Gaussians in terms of accuracy. The orientation of...Mθ_i is modeled by...

CC

CC

We characterize each BEV grid cell in the drivable area and intersection channels as Bernoulli random variables, denoted as MA_i and MI_i respectively, since we assume that each grid cell either belongs to these elements or does not belong to them.

【cc】可行驶区域和交叉路口的栅格分布服从二项Bernoulli分布

The loss function corresponds to the negative log-likelihood (NLL) within the dataset's distribution. To elaborate, Gaussian NLL is used to assess the uncertainty in reachable lanes' distance transforms (MD), Von Mises NLL evaluates the directional consistency of traffic flows (Mθ), while binary cross-entropy measures drivable areas (MA) and junctions (MJ).

CC

CC

six channels encompass: one channel dedicated to drivable area assessment, another channel focuses on intersection evaluation, two channels each handle the calculation of the truncated unsigned distance to reachable lanes based on Gaussian distribution parameters—specifically, their mean and variance. Additionally, two additional channels compute the angular deviation from the closest reachable lane segment using Von Mises distribution characteristics—its location and concentration parameters.

【CC】即对应上面3个目标函数的6个参数

The backbone network processes multiple feature maps, including C_{1x}, C_{2x}, and C (see Figure 6), to generate six channels of data at a spatial resolution of 0.2\ \mathrm{m/pixel}.

在这里插入图片描述

Perception and Prediction architecture:

在这里插入图片描述
在这里插入图片描述

We formulate the presence of dynamic objects Oc for each class c ∈ {vehicle, pedestrian, bicyclist} as a set of Bernoulli random variables Oc(t,i), one corresponding to each spatiotemporal index t,i. The occupancy O of dynamic objects incorporates cross-entropy loss with hard negative mining to address the data imbalance (i.e., most spatial regions remain unoccupied).

CC

CC

An agent's future behavior is inherently uncertain and exhibits multiple modes of motion, as demonstrated by examples such as a vehicle moving straight or making a right turn. To model the motion for each class at every spatio-temporal location, we employ a categorical distribution denoted by K_{c,t,i}, which encompasses all possible BEV motion vectors \{V_{c,t,i,k}\} where k ranges from 1 to K. The number of distinct motion modes, referred to as K, is determined through unsupervised learning by minimizing the categorical cross-entropy loss function.

CC

CC

Then, only the corresponding motion vector from the true mode is trained by a Huber loss. The true mode refers to the one whose associated motion vector is closest to the ground-truth motion in L² norm sense.

CC

CC

CC

CC

Within our probabilistic model framework, we first establish the transition probability of occupancy moving from location i_1 to location i_2 over a single time step between t and t+1. To compute the probability of future occupancy under our probabilistic model, we proceed by defining this transition probability as follows:

在这里插入图片描述

where p(Vt,i1,k = i2) assigns mass locally and is determined by bilinear interpolation provided that i2 lies within the 4 nearest grid cells of the continuous motion vector's leading edge

在这里插入图片描述

The number of occupants flowing into cell i during time step t + 1 can be derived by determining the probability that no flow event occurs across all cells j at time t, and then subtracting this value from one.

在这里插入图片描述

CC

CC

Routing Network

we posit that each element of set c represents either maintaining the current lane or turning left/right. Each such tuple is accompanied by an approximate longitudinal distance d to execution. Given this driving command c and the predicted map M of environmental state, the routing module calculates an occupancy probability distribution R across the bird's-eye view.

在这里插入图片描述

Within a command, a high-level action functions as a selector among three identical network architectures (e.g., one tailored for right turns, another for left turns, and one designed for straight-ahead movements). The longitudinal distance associated with an action is redundantly projected across space at matching resolutions relative to an online map. We employ binary cross-entropy loss during training of our route-prediction model. We represent routes through a set of Bernoulli distributions, assigning each grid cell in bird’s eye view its own probability distribution.

CC

CC

CC

在这里插入图片描述

Then, these are merged to serve as the input for a CNN that employs Coordinate Convolutions (CoordConv) [29], enabling it to reason about distances to specific grid cells based on the SDV.

【CC】这里使用CoordConv是为了搞定Conv中位置信息的问题

Trajectory Retrieve

该方法能够生成大量来自专家演示的轨迹数据,并通过避免随机采样或任意选择加速度/转向策略来实现这一目标。通过将SDV初始状态分组并聚类其轨迹以创建专家演示的数据集,并利用聚类 prototype提高效率。在实时运动规划过程中,我们检索指定(vx, ax, κx)的轨迹组,并根据当前状态获取所需数据。

在这里插入图片描述

Classify the trajectories into distinct bins by utilizing the initial velocity v (measured in m/s), curvature κ (in reciprocal seconds), and acceleration a (expressed in m/s²). Each bin has specific dimensions: one dimension is set at 2.0 m/s, another at 1/5 s⁻¹ (or equivalently 5 Hz), and the third at approximately half a meter per second squared (m/s²). After dividing the data into these bins, perform clustering analysis on all but one hundred randomly selected data points from each bin's dataset to form clusters totaling three thousand within each bin. For validation purposes, retain only those data points that lie closest to their respective cluster centroids or prototypes for further examination.

CC

CC

Trajectory Scoring

在这里插入图片描述

Reachable-lanes direct
我们促进那些与车道方向一致的轨迹。为了这一目标,在所有与SDV多边形存在重叠区域的BEV网格中计算轨迹点方向与Mθ角度的平均差异。

在这里插入图片描述

where m(x) represents the spatial indices of BEV grid-cells in the map-layer prediction that is overlapped by the SDV polygon in terms of its state information

CC

CC

CC

. Lane uncertainty

在这里插入图片描述

Here, σ_Di represents a measure of spread in a Gaussian distribution that quantifies the distance to the nearest reachable lane center. Meanwhile, kθ_i signifies a concentration parameter within a von Mises distribution that corresponds to lane direction.

CC

Occupancy rate
In the context of an ego car's state at a single time step, we employ a cost function designed to impose penalties on paths that intersect with occupied areas.

在这里插入图片描述

where m(xt) denotes the BEV grid-cells, which belong to a specific region characterized by semantic class c and are overlapped by a polygon representing an object or area in association with state xt.

【CC】碰撞检测,没什么好说的

Headway
Calculating the headway cost, we identify the set of BEV grid-cells m(xt) that lie within 20 meters ahead of the Surrounding Dynamic Vehicle (SDV) at time t. The calculation of the headway cost proceeds based on this identified set.

在这里插入图片描述

the expected value is calculated across various motion modes. The function h(x_t, V_{t,i}) evaluates the occurrence of violating safety distance when an object located at spatial index i moving at speed V_{t,i} halts abruptly due to hard braking, while SDV equipped with state x_t responds by applying a more gradual braking.

CC

CC

参考技术

Group normalization
https://www.cnblogs.com/jins-note/p/11342565.html
为了避免样本数量不足所导致的潜在问题,采用GN方法。

Uber提出了一种名为CoordConv的新方法:针对传统CNN在处理坐标变换时存在不足的问题。该方法通过引入额外的坐标编码机制来保留位置信息,在此过程中克服了纯卷积操作所具有的平移不变性所带来的缺陷(即导致位置信息丢失)。与传统的FCN架构相比,在提取目标区域时缺乏对物体中心坐标的精确定位能力。这一技术思路与Word2Vec中利用词向量表示捕捉语义信息的方式相类比;将其应用于目标检测任务(特别是需考虑物体定位精度的情形)具有显著的研究价值

Huber损失函数处于L1范数与L2范数之间的目标函数,在融合两者优点的同时具备梯度下降能力的自适应性。该损失函数通过将异常值的影响限制在一定范围内(类似于均值绝对误差MAE),从而在回归任务中展现出良好的鲁棒性。

Max margin loss (Hinge Loss)
https://www.cnblogs.com/yymn/p/8336979.html
当应用于分类问题时(如支持向量机SVM中):对于预测结果与真实样本之间的差异进行最大幅度限制,并在超出预设阈值时则不会给予积极反馈

cross entropy
熵(Entropy)变为KL散度(Kullback-Leibler Divergence),接着成为交叉熵
binary cross-entropy loss

negative loglikelihood (NLL)

Von Mises分布
https://baike.baidu.com/item/冯·米塞斯分布/3733332?fr=aladdin
angular domain中遵循normal distribution的概率密度函数

Dilated Convolution
https://zhuanlan.zhihu.com/p/39542237
为了在语义分割中获得不同分辨率的特征图, 通常采用池化操作来进行下采样后再借助反卷积操作实现上采样, 并利用中间不同尺度的特征图进行分析;此外还有基于空间金字塔池化的方法。本研究则提出了一种新颖的方法:通过对卷积核空洞率的操作来扩大感受野(即实现下采样的目的), 并整合多级输出信息以获得更全面的理解

全部评论 (0)

还没有任何评论哟~