Advertisement

LCDNet: Deep Loop Closure Detection and Point Cloud Registration for LiDAR SLAM

阅读量:

动机

针对SLAM中的回环检测问题

总体网络架构

LCDNet is built from three main components: a shared encoder, a place recognition head responsible for extracting global descriptors, and an innovative relative pose module designed to estimate the transformation between two point clouds. We present an innovative relative pose module grounded in unbalanced optimal transport theory, implemented in a manner that ensures differentiability, thereby enabling end-to-end training workflows.

在这里插入图片描述
在这里插入图片描述

特征提取

Our network's feature extractor stream is constructed upon the foundation of PV-RCNN. The system processes an input point cloud P, where each point consists of four attributes: x, y, z coordinates and intensity. This processed data serves as the basis for generating a set of N keypoint descriptors denoted as FRP = {f_r1,…,f_rN}, where each descriptor f_r^i ∈ ℝ^D represents a D-dimensional vector associated with the i-th keypoint.

employ the Farthest Point Sampling (FPS) algorithm [54] to reduce the sampling density of the point cloud and choose N uniformly distributed keypoint locations. In this VSA module, integrate the original points with outputs from four feature layers to form a feature vector for each feature point.

在这里插入图片描述

For each designated keypoint kpi, and each layer l of the pyramid feature map, the keypoint features f_{l,i} are calculated as

在这里插入图片描述

其中, MP被定义为最大池化操作,G被定义为一个多层感知机(MLP),而M随机采样对应的邻居体素特征集S_{l,i}并将其计算为...

在这里插入图片描述

where f_{\text{voxl},j}表示位于层级l中的voxelj特征,并且v_{l,j}表示层级l中voxelj的坐标;同时r代表相邻半径。该操作在金字塔的所有层级上执行以生成结果

在这里插入图片描述

For both the raw input data and the BEV feature map, we conduct corresponding processing steps to obtain aggregated keypoint features.

在这里插入图片描述

Finally, we apply a deep learning model of MLP to aggregate the keypoint features and generate the final keypoint feature vectors as

在这里插入图片描述

Global Descriptor

The system employs the NetVLAD layer to translate an (N x D)-dimensional FRP set into a (K x D)-dimensional vector V(FRP) by learning K cluster centers {c1,…,cK}, where each ck ∈ R^D.
NetVLAD represents an enhancement over the K-means algorithm in terms of its depth enhancement capabilities.
A soft assignment is defined as

在这里插入图片描述

wherein wk ∈ ℝ^D and bk ∈ ℝ are learnable parameters comprising weights and biases. In practice, ak(f rP i) denotes the probability of assigning the feature vector f rP i to the centroid ck.

The final NetVLAD descriptor V(FRP) = [V₁(FRP), …, V_K(FRP)] represents a comprehensive fusion of features extracted from each frame, achieved through an innovative integration of the original VLAD formulation with a refined soft assignment mechanism based on Eq. (上式).

在这里插入图片描述

我们采用了一个简单的多层感知机(MLP),将K×D维空间中的V(FRP)向量压缩为G维的简洁描述符。接着,在MLP输出的基础上运用Context Gating(CG)模块生成最终的整体描述符f§∈R^G。最后发现,在MLP输出中引入自注意力机制进行加权是关键步骤

在这里插入图片描述

Denoted as X, where X is the MLP output. The \sigma operator represents an element-wise sigmoid operation. The element-wise multiplication operation represents a mathematical operation. The weights W and biases b represent parameters of the MLP.

相对位姿估计

employ the Sinkhorn algorithm
The discrete Kantorovich formulation for the optimal transport problem is mathematically defined as follows.

在这里插入图片描述

Where Ci,j denotes the cost of matching the ith element in set P with the jth element in set S. To utilize the Sinkhorn algorithm, we incorporate an entropic regularization element.

在这里插入图片描述

where λ represents a parameter that regulates the sparsity of the mapping mechanism (as λ approaches zero, T tends toward a bijective mapping)
However, both Equations (9) and (10) are contingent upon A being a doubly stochastic matrix under the condition that conservation of mass holds true; that is, each point within FRP must be associated with one or more points in FRS and reciprocally.
One potential approach involves reframing the problem as unbalanced optimal transport (UOT), which permits creation or annihilation of mass, thereby relaxing the stringent constraints imposed by balanced optimal transport. This framework is mathematically defined as

在这里插入图片描述

where KL denotes the Kullback-Leibler (KL)散度, U represents the discrete uniform distribution, and \rho serves as a parameter governing the degree of mass preservation.

在这里插入图片描述

We define the cost matrix C using a cosine-based distance measure between the keypoint descriptors.

在这里插入图片描述

Upon estimating the imbalanced optimal transport T, we determine the projected coordinates for each keypoint pj in set P, specifically the corresponding projected positions within manifold S.

在这里插入图片描述

Apply weighted Singular Value Decomposition to calculate the rigid body transformation between the original point cloud P and its projected counterpart within set S.

Since Algorithm 1 and Singular Value Decomposition (SVD) are differentiable, we train our relative pose module in a fully connected manner by evaluating the disparity between the predicted transformation H_{\beta}^{SP} and the ground truth transformation H_{\text{SP}}.

在这里插入图片描述

Loss Function

By employing the triplet loss, we develop our global descriptors. The anchor point cloud Pa is accompanied by a positive sample, which is defined as the point cloud representing the same location, and a negative sample, representing points from distinct locations.

在这里插入图片描述

where d(\cdot) represents a distance function, m denotes the target separation margin, and [x]^+ symbolizes the positive part of x.

在这里插入图片描述

The training process of the model is carried out through comparing the anchor point cloud P_a = {p_{a1}, …, p_{aJ}} that has undergone transformation via predicted transformation H_bp,a with its corresponding groundtruth transformation H_pa.

在这里插入图片描述
在这里插入图片描述

We incorporate an auxiliary loss term into the matching estimates derived from the unbalanced optimal transport T:

在这里插入图片描述

我们还展示了我们的所提出的最优运输辅助损失在图1所示的方程中提出后能够显著地帮助网络学习更加判别的特征。

在这里插入图片描述

The end type of loss function is the linear combination of the three mentioned components.

在这里插入图片描述

where β is loss balancing term

其他引用技术

  • Triplet Loss 损失函数

用于训练具有相近特性的样本例如人脸等 数据集包含锚点(Anchor Point)示例 正向(Positive)反向(Negative)以及反向实例 通过优化锚点与正向实例之间的距离使其小于其与反向实例之间的距离 以实现样本间相似度的评估

在这里插入图片描述

Triplet Loss损失函数的计算公式如下:

在这里插入图片描述
  • Sinkhorn算法

源自

设变量r表示每位顾客获得的小吃数量,则有r = [3,\ 3,\ 3,\ 4,\ 2,\ 2,\ 2,\ 1]^{\top}。其中向量r的空间维数是8,在后续分析中我们将其记作n。
变量c被定义为各类小吃的总份数,则有c = [4,\ 2,\ 6,\ 4,\ 4]^{\top}。其中向量c的空间维数是5,在此我们将其标记为m。
一般而言,在统计学中我们用边缘概率分布(即从一个概率空间转换到另一个的概率空间)来描述不同随机事件之间的关系;因此在本研究中 r c 作为边缘分布时其元素之和必须等于1;这在后续计算中会得到应用。

定义

在这里插入图片描述

U(r,c)涵盖了全部可能的资源分配方案。特别说明一下,在这里每份小吃是可以被任意分割地分配的。

每个用户对各小吃的喜爱程度存储在矩阵中

在这里插入图片描述

(矩阵M也可称为代价矩阵,在此例中对矩阵中的元素执行取反运算即可得到其对应的代价矩阵) 最终问题可清晰地表示为,...

在这里插入图片描述

其中

在这里插入图片描述

又被称为Wasserstein距离。此外,在这种情况下还可以引入正则项以改善问题描述的合理性。即,在这种情况下可以引入正则项以改善问题描述的合理性。

在这里插入图片描述

也被称作Sinkhorn distance。在此案例中,则是为了最大限度地分配每个人最喜欢的餐点,并且尽力实现均衡分配。(此处略去对具体含义的阐述)

解决方法:Sinkhorn
上述问题最优解可表示为,

在这里插入图片描述

其中

在这里插入图片描述

为要求解的常数,具体的,Sinkhorn伪代码为

在这里插入图片描述

全部评论 (0)

还没有任何评论哟~