MP3- A Unified Model to Map Perceive Predict and Plan

阅读量：

简介

动机：由于HD map更新不够及时，在定位过程中出现的问题也使得HD数据不可靠。因此提出了一种无需依赖传统 HD map 的 LIDAR 基础方案输入：原始 LIDAR 信息（点云时间序列）+ Routing 指令（例如转向右方）特点：具有良好的可解释性——通过引入可解码层实现了模型行为的一一对应关系

主干结构

The LiDAR data has been discretized into voxels of 0.2 meters per pixel, resulting in a resolution of 0.8 meters per pixel for variable C.

概念介绍

**可行驶的道路表面（如路面或人行道）**是指被道路边缘线包围的地方，在这里车辆能够通行。

Intersection: 可管理的驾驶区域（即通过交通信号灯或交通标志来控制的道路区域）。深入分析这一点对于处理Stop/ Yield标志和交通灯至关重要。

Reachable lanes: Central lines (also known as motion paths) represent the standard routes vehicles follow, typically positioned midway between two lane markings. The reachable lanes are designated as the set of motion paths accessible by the SDV while adhering to all traffic regulations. During trajectory planning, it's optimal for the SDV to maintain proximity to these reachable lanes and align its driving direction with their orientation. These characteristics ensure that our system maintains safe and efficient vehicle movements within predefined boundaries.

Initial Occupancy: A BEV grid cell is marked as occupied if the center lies within the boundaries defined by an object's shape and its current orientation.

Temporal Motion Field: defined as the set of occupied pixel locations at a specific future time interval. Each pixel's motion is depicted by a two-dimensional bird's-eye view velocity vector, measured in meters per second. The motion field is divided into T intervals of equal duration, up to a maximum of five seconds, incrementing every half-second.

within this paper, we introduce an occupancy-flow system based on the occupancy states of dynamic objects in their current configurations and incorporating a temporal motion field to predict how these objects will evolve over time. Both aspects are discretized onto a spatial grid over Bird's-Eye View with an resolutionscaleof 0.4 meters per pixel, as shown in Figure 4.

Backbone network:

Extracts spatial and semantic features across prior LiDAR sweeps. Within each residual stage, scene contextual features are represented as C1x, C2x, C4x, and C8x. Here, the subscript notation represents the downsampling factor relative to the input.

Since the LiDAR input is discretized into 0.2-meter voxels, C achieves a spatial resolution of 0.8 meters per pixel.

Mapping architecture

We consider modeling...MD_i as a Laplacian, which we found that this approach outperforms Gaussians in terms of accuracy. The orientation of...Mθ_i is modeled by...

We characterize each BEV grid cell in the drivable area and intersection channels as Bernoulli random variables, denoted as MA_i and MI_i respectively, since we assume that each grid cell either belongs to these elements or does not belong to them.

【cc】可行驶区域和交叉路口的栅格分布服从二项Bernoulli分布

The loss function corresponds to the negative log-likelihood (NLL) within the dataset's distribution. To elaborate, Gaussian NLL is used to assess the uncertainty in reachable lanes' distance transforms (MD), Von Mises NLL evaluates the directional consistency of traffic flows (Mθ), while binary cross-entropy measures drivable areas (MA) and junctions (MJ).

six channels encompass: one channel dedicated to drivable area assessment, another channel focuses on intersection evaluation, two channels each handle the calculation of the truncated unsigned distance to reachable lanes based on Gaussian distribution parameters—specifically, their mean and variance. Additionally, two additional channels compute the angular deviation from the closest reachable lane segment using Von Mises distribution characteristics—its location and concentration parameters.

【CC】即对应上面3个目标函数的6个参数

The backbone network processes multiple feature maps, including $C_{1x}$ , $C_{2x}$ , and $C$ (see Figure 6), to generate six channels of data at a spatial resolution of $0.2\ \mathrm{m/pixel}$ .

Perception and Prediction architecture:

We formulate the presence of dynamic objects Oc for each class c ∈ {vehicle, pedestrian, bicyclist} as a set of Bernoulli random variables Oc(t,i), one corresponding to each spatiotemporal index t,i. The occupancy O of dynamic objects incorporates cross-entropy loss with hard negative mining to address the data imbalance (i.e., most spatial regions remain unoccupied).

An agent's future behavior is inherently uncertain and exhibits multiple modes of motion, as demonstrated by examples such as a vehicle moving straight or making a right turn. To model the motion for each class at every spatio-temporal location, we employ a categorical distribution denoted by $K_{c,t,i}$ , which encompasses all possible BEV motion vectors $\{V_{c,t,i,k}\}$ where $k$ ranges from $1$ to $K$ . The number of distinct motion modes, referred to as $K$ , is determined through unsupervised learning by minimizing the categorical cross-entropy loss function.

Then, only the corresponding motion vector from the true mode is trained by a Huber loss. The true mode refers to the one whose associated motion vector is closest to the ground-truth motion in L² norm sense.

Within our probabilistic model framework, we first establish the transition probability of occupancy moving from location $i_1$ to location $i_2$ over a single time step between $t$ and $t+1$ . To compute the probability of future occupancy under our probabilistic model, we proceed by defining this transition probability as follows:

where p(Vt,i1,k = i2) assigns mass locally and is determined by bilinear interpolation provided that i2 lies within the 4 nearest grid cells of the continuous motion vector's leading edge

The number of occupants flowing into cell i during time step t + 1 can be derived by determining the probability that no flow event occurs across all cells j at time t, and then subtracting this value from one.

Routing Network

we posit that each element of set c represents either maintaining the current lane or turning left/right. Each such tuple is accompanied by an approximate longitudinal distance d to execution. Given this driving command c and the predicted map M of environmental state, the routing module calculates an occupancy probability distribution R across the bird's-eye view.

Within a command, a high-level action functions as a selector among three identical network architectures (e.g., one tailored for right turns, another for left turns, and one designed for straight-ahead movements). The longitudinal distance associated with an action is redundantly projected across space at matching resolutions relative to an online map. We employ binary cross-entropy loss during training of our route-prediction model. We represent routes through a set of Bernoulli distributions, assigning each grid cell in bird’s eye view its own probability distribution.

Then, these are merged to serve as the input for a CNN that employs Coordinate Convolutions (CoordConv) [29], enabling it to reason about distances to specific grid cells based on the SDV.

【CC】这里使用CoordConv是为了搞定Conv中位置信息的问题

Trajectory Retrieve

该方法能够生成大量来自专家演示的轨迹数据，并通过避免随机采样或任意选择加速度/转向策略来实现这一目标。通过将SDV初始状态分组并聚类其轨迹以创建专家演示的数据集，并利用聚类 prototype提高效率。在实时运动规划过程中，我们检索指定（vx, ax, κx）的轨迹组，并根据当前状态获取所需数据。

Classify the trajectories into distinct bins by utilizing the initial velocity v (measured in m/s), curvature κ (in reciprocal seconds), and acceleration a (expressed in m/s²). Each bin has specific dimensions: one dimension is set at 2.0 m/s, another at 1/5 s⁻¹ (or equivalently 5 Hz), and the third at approximately half a meter per second squared (m/s²). After dividing the data into these bins, perform clustering analysis on all but one hundred randomly selected data points from each bin's dataset to form clusters totaling three thousand within each bin. For validation purposes, retain only those data points that lie closest to their respective cluster centroids or prototypes for further examination.

Trajectory Scoring

Reachable-lanes direct
我们促进那些与车道方向一致的轨迹。为了这一目标，在所有与SDV多边形存在重叠区域的BEV网格中计算轨迹点方向与Mθ角度的平均差异。

where $m(x)$ represents the spatial indices of BEV grid-cells in the map-layer prediction that is overlapped by the SDV polygon in terms of its state information

. Lane uncertainty

Here, σ_Di represents a measure of spread in a Gaussian distribution that quantifies the distance to the nearest reachable lane center. Meanwhile, kθ_i signifies a concentration parameter within a von Mises distribution that corresponds to lane direction.

Occupancy rate
In the context of an ego car's state at a single time step, we employ a cost function designed to impose penalties on paths that intersect with occupied areas.

where m(xt) denotes the BEV grid-cells, which belong to a specific region characterized by semantic class c and are overlapped by a polygon representing an object or area in association with state xt.

【CC】碰撞检测，没什么好说的

Headway
Calculating the headway cost, we identify the set of BEV grid-cells m(xt) that lie within 20 meters ahead of the Surrounding Dynamic Vehicle (SDV) at time t. The calculation of the headway cost proceeds based on this identified set.

the expected value is calculated across various motion modes. The function h(x_t, V_{t,i}) evaluates the occurrence of violating safety distance when an object located at spatial index i moving at speed V_{t,i} halts abruptly due to hard braking, while SDV equipped with state x_t responds by applying a more gradual braking.

参考技术

Group normalization
https://www.cnblogs.com/jins-note/p/11342565.html
为了避免样本数量不足所导致的潜在问题，采用GN方法。

Uber提出了一种名为CoordConv的新方法：针对传统CNN在处理坐标变换时存在不足的问题。该方法通过引入额外的坐标编码机制来保留位置信息，在此过程中克服了纯卷积操作所具有的平移不变性所带来的缺陷（即导致位置信息丢失）。与传统的FCN架构相比，在提取目标区域时缺乏对物体中心坐标的精确定位能力。这一技术思路与Word2Vec中利用词向量表示捕捉语义信息的方式相类比；将其应用于目标检测任务（特别是需考虑物体定位精度的情形）具有显著的研究价值

Huber损失函数处于L1范数与L2范数之间的目标函数，在融合两者优点的同时具备梯度下降能力的自适应性。该损失函数通过将异常值的影响限制在一定范围内（类似于均值绝对误差MAE），从而在回归任务中展现出良好的鲁棒性。

Max margin loss （Hinge Loss）
https://www.cnblogs.com/yymn/p/8336979.html
当应用于分类问题时（如支持向量机SVM中）：对于预测结果与真实样本之间的差异进行最大幅度限制，并在超出预设阈值时则不会给予积极反馈

cross entropy
熵(Entropy)变为KL散度(Kullback-Leibler Divergence)，接着成为交叉熵
binary cross-entropy loss

negative loglikelihood (NLL)

Von Mises分布
https://baike.baidu.com/item/冯·米塞斯分布/3733332?fr=aladdin
angular domain中遵循normal distribution的概率密度函数

Dilated Convolution
https://zhuanlan.zhihu.com/p/39542237
为了在语义分割中获得不同分辨率的特征图, 通常采用池化操作来进行下采样后再借助反卷积操作实现上采样, 并利用中间不同尺度的特征图进行分析；此外还有基于空间金字塔池化的方法。本研究则提出了一种新颖的方法:通过对卷积核空洞率的操作来扩大感受野(即实现下采样的目的), 并整合多级输出信息以获得更全面的理解

全部评论 (0)

还没有任何评论哟~

MP3- A Unified Model to Map Perceive Predict and Plan

简介动机：HDmap更新不及时，还有因为定位问题导致HD不靠谱，需要一种基于LIDAR的无HDMAP的解决方案输入：原始的LIDAR信息（点云时间序列）+Routing的指令（比如，turnrig...

《Model-based Interactive Semantic Parsing: A Unified Framework and A Text-to-SQL Case Study》读后感

ModelbasedInteractiveSemanticParsing:AUnifiedFrameworkandATexttoSQLCaseStudy 目录一、摘要&总结&未来工作 1摘要 2总结...

A transformer-based model to predict peptide– HLA class I binding and optimize mutated peptides for

文章链接：https://www.nature.com/articles/s42256022004597.pdf DOI：10.1038/s42256022004597 期刊：NatureMachin...

Tesla Details a Plan to Upgrade Model 3 Performance by

作者：禅与计算机程序设计艺术 1.简介自从2016年发布了Model3在美国市场之后，激动不已的是看到特斯拉并没有放弃对Autopilot（自动驾驶）系统的开发，而是通过改进底盘结构、升级处理器等方...

KEPLER: A unified model for knowledge embedding and pre-trainedlanguage representation

摘要预先训练的语言表示模型PLMs不能很好地从文本中获取事实知识。相比之下，知识嵌入方法可以通过信息实体嵌入有效地表示知识图中的关系事实，但传统的知识嵌入模型不能充分利用丰富的文本信息。

How to Plan a project

Anumberofcompaniestodayarehiringpeopletomanagealargenumberofprojects.Thethingsthatthesecompaniesarel...

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

通用OCR理论：通过统一的端到端模型实现OCR2.0 Abstract 随着人们对人工光学字符的智能处理需求日益增长，传统的OCR系统（OCR1.0）已越来越不能满足人们的使用需求。

A plan to make BPF kfuncs polymorphic

DavidVernetkickedofftheBPFtrackat2024'sBPFtrackattheLinuxStorage,Filesystem,MemoryManagement,andBPFS...

读论文2018 ACL A unified model for extractive and abstractive summarization using inconsistency loss

Abstract 提出了一个兼具抽取式和生成式优点的摘要模型。一方面，简单的抽取式模型可以包含句子级别的attention，且句子集合的ROUGE分数很高，但句子可读性差。另一方面，一个非常复杂的摘要...

SHAP模型可解释性方法：A Unified Approach to Interpreting Model Predictions

Abstract 在许多应用中，理解一个模型为什么会做出某种预测与预测的准确性一样重要。然而，大型现代数据集的最高精度通常是通过复杂的模型来实现的，即使专家也很难解释，比如集成或深度学习模型，这在准确...

是否确定退出登录?

MP3- A Unified Model to Map Perceive Predict and Plan

简介

主干结构

概念介绍

Backbone network:

Mapping architecture

Perception and Prediction architecture:

Routing Network

Trajectory Retrieve

Trajectory Scoring

参考技术

全部评论 (0)

相关文章推荐

MP3- A Unified Model to Map Perceive Predict and Plan

《Model-based Interactive Semantic Parsing: A Unified Framework and A Text-to-SQL Case Study》读后感

A transformer-based model to predict peptide– HLA class I binding and optimize mutated peptides for

Tesla Details a Plan to Upgrade Model 3 Performance by

KEPLER: A unified model for knowledge embedding and pre-trainedlanguage representation

How to Plan a project

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

A plan to make BPF kfuncs polymorphic

读论文2018 ACL A unified model for extractive and abstractive summarization using inconsistency loss

SHAP模型可解释性方法：A Unified Approach to Interpreting Model Predictions