LCDNet: Deep Loop Closure Detection and Point Cloud Registration for LiDAR SLAM
动机
针对SLAM中的回环检测问题
总体网络架构
LCDNet is built from three main components: a shared encoder, a place recognition head responsible for extracting global descriptors, and an innovative relative pose module designed to estimate the transformation between two point clouds. We present an innovative relative pose module grounded in unbalanced optimal transport theory, implemented in a manner that ensures differentiability, thereby enabling end-to-end training workflows.


特征提取
Our network's feature extractor stream is constructed upon the foundation of PV-RCNN. The system processes an input point cloud P, where each point consists of four attributes: x, y, z coordinates and intensity. This processed data serves as the basis for generating a set of N keypoint descriptors denoted as FRP = {f_r1,…,f_rN}, where each descriptor f_r^i ∈ ℝ^D represents a D-dimensional vector associated with the i-th keypoint.
employ the Farthest Point Sampling (FPS) algorithm [54] to reduce the sampling density of the point cloud and choose N uniformly distributed keypoint locations. In this VSA module, integrate the original points with outputs from four feature layers to form a feature vector for each feature point.

For each designated keypoint kpi, and each layer l of the pyramid feature map, the keypoint features f_{l,i} are calculated as

其中, MP被定义为最大池化操作,G被定义为一个多层感知机(MLP),而M随机采样对应的邻居体素特征集S_{l,i}并将其计算为...

where f_{\text{voxl},j}表示位于层级l中的voxelj特征,并且v_{l,j}表示层级l中voxelj的坐标;同时r代表相邻半径。该操作在金字塔的所有层级上执行以生成结果

For both the raw input data and the BEV feature map, we conduct corresponding processing steps to obtain aggregated keypoint features.

Finally, we apply a deep learning model of MLP to aggregate the keypoint features and generate the final keypoint feature vectors as

Global Descriptor
The system employs the NetVLAD layer to translate an (N x D)-dimensional FRP set into a (K x D)-dimensional vector V(FRP) by learning K cluster centers {c1,…,cK}, where each ck ∈ R^D.
NetVLAD represents an enhancement over the K-means algorithm in terms of its depth enhancement capabilities.
A soft assignment is defined as

wherein wk ∈ ℝ^D and bk ∈ ℝ are learnable parameters comprising weights and biases. In practice, ak(f rP i) denotes the probability of assigning the feature vector f rP i to the centroid ck.
The final NetVLAD descriptor V(FRP) = [V₁(FRP), …, V_K(FRP)] represents a comprehensive fusion of features extracted from each frame, achieved through an innovative integration of the original VLAD formulation with a refined soft assignment mechanism based on Eq. (上式).

我们采用了一个简单的多层感知机(MLP),将K×D维空间中的V(FRP)向量压缩为G维的简洁描述符。接着,在MLP输出的基础上运用Context Gating(CG)模块生成最终的整体描述符f§∈R^G。最后发现,在MLP输出中引入自注意力机制进行加权是关键步骤

Denoted as X, where X is the MLP output. The \sigma operator represents an element-wise sigmoid operation. The element-wise multiplication operation represents a mathematical operation. The weights W and biases b represent parameters of the MLP.
相对位姿估计
employ the Sinkhorn algorithm
The discrete Kantorovich formulation for the optimal transport problem is mathematically defined as follows.

Where Ci,j denotes the cost of matching the ith element in set P with the jth element in set S. To utilize the Sinkhorn algorithm, we incorporate an entropic regularization element.

where λ represents a parameter that regulates the sparsity of the mapping mechanism (as λ approaches zero, T tends toward a bijective mapping)
However, both Equations (9) and (10) are contingent upon A being a doubly stochastic matrix under the condition that conservation of mass holds true; that is, each point within FRP must be associated with one or more points in FRS and reciprocally.
One potential approach involves reframing the problem as unbalanced optimal transport (UOT), which permits creation or annihilation of mass, thereby relaxing the stringent constraints imposed by balanced optimal transport. This framework is mathematically defined as

where KL denotes the Kullback-Leibler (KL)散度, U represents the discrete uniform distribution, and \rho serves as a parameter governing the degree of mass preservation.

We define the cost matrix C using a cosine-based distance measure between the keypoint descriptors.

Upon estimating the imbalanced optimal transport T, we determine the projected coordinates for each keypoint pj in set P, specifically the corresponding projected positions within manifold S.

Apply weighted Singular Value Decomposition to calculate the rigid body transformation between the original point cloud P and its projected counterpart within set S.
Since Algorithm 1 and Singular Value Decomposition (SVD) are differentiable, we train our relative pose module in a fully connected manner by evaluating the disparity between the predicted transformation H_{\beta}^{SP} and the ground truth transformation H_{\text{SP}}.

Loss Function
By employing the triplet loss, we develop our global descriptors. The anchor point cloud Pa is accompanied by a positive sample, which is defined as the point cloud representing the same location, and a negative sample, representing points from distinct locations.

where d(\cdot) represents a distance function, m denotes the target separation margin, and [x]^+ symbolizes the positive part of x.

The training process of the model is carried out through comparing the anchor point cloud P_a = {p_{a1}, …, p_{aJ}} that has undergone transformation via predicted transformation H_bp,a with its corresponding groundtruth transformation H_pa.


We incorporate an auxiliary loss term into the matching estimates derived from the unbalanced optimal transport T:

我们还展示了我们的所提出的最优运输辅助损失在图1所示的方程中提出后能够显著地帮助网络学习更加判别的特征。

The end type of loss function is the linear combination of the three mentioned components.

where β is loss balancing term
其他引用技术
- Triplet Loss 损失函数
 
用于训练具有相近特性的样本例如人脸等 数据集包含锚点(Anchor Point)示例 正向(Positive)反向(Negative)以及反向实例 通过优化锚点与正向实例之间的距离使其小于其与反向实例之间的距离 以实现样本间相似度的评估

Triplet Loss损失函数的计算公式如下:

- Sinkhorn算法
 
源自
设变量r表示每位顾客获得的小吃数量,则有r = [3,\ 3,\ 3,\ 4,\ 2,\ 2,\ 2,\ 1]^{\top}。其中向量r的空间维数是8,在后续分析中我们将其记作n。
变量c被定义为各类小吃的总份数,则有c = [4,\ 2,\ 6,\ 4,\ 4]^{\top}。其中向量c的空间维数是5,在此我们将其标记为m。
一般而言,在统计学中我们用边缘概率分布(即从一个概率空间转换到另一个的概率空间)来描述不同随机事件之间的关系;因此在本研究中 r 与 c 作为边缘分布时其元素之和必须等于1;这在后续计算中会得到应用。
定义

U(r,c)涵盖了全部可能的资源分配方案。特别说明一下,在这里每份小吃是可以被任意分割地分配的。
每个用户对各小吃的喜爱程度存储在矩阵中

(矩阵M也可称为代价矩阵,在此例中对矩阵中的元素执行取反运算即可得到其对应的代价矩阵) 最终问题可清晰地表示为,...

其中

又被称为Wasserstein距离。此外,在这种情况下还可以引入正则项以改善问题描述的合理性。即,在这种情况下可以引入正则项以改善问题描述的合理性。

也被称作Sinkhorn distance。在此案例中,则是为了最大限度地分配每个人最喜欢的餐点,并且尽力实现均衡分配。(此处略去对具体含义的阐述)
解决方法:Sinkhorn
上述问题最优解可表示为,

其中

为要求解的常数,具体的,Sinkhorn伪代码为

