LCDNet: Deep Loop Closure Detection and Point Cloud Registration for LiDAR SLAM

阅读量：

动机

针对SLAM中的回环检测问题

总体网络架构

LCDNet is built from three main components: a shared encoder, a place recognition head responsible for extracting global descriptors, and an innovative relative pose module designed to estimate the transformation between two point clouds. We present an innovative relative pose module grounded in unbalanced optimal transport theory, implemented in a manner that ensures differentiability, thereby enabling end-to-end training workflows.

特征提取

Our network's feature extractor stream is constructed upon the foundation of PV-RCNN. The system processes an input point cloud P, where each point consists of four attributes: x, y, z coordinates and intensity. This processed data serves as the basis for generating a set of N keypoint descriptors denoted as FRP = {f_r^1,…,f_rN}, where each descriptor f_r^i ∈ ℝ^D represents a D-dimensional vector associated with the i-th keypoint.

employ the Farthest Point Sampling (FPS) algorithm [54] to reduce the sampling density of the point cloud and choose N uniformly distributed keypoint locations. In this VSA module, integrate the original points with outputs from four feature layers to form a feature vector for each feature point.

For each designated keypoint kpi, and each layer l of the pyramid feature map, the keypoint features f_{l,i} are calculated as

其中, MP被定义为最大池化操作,G被定义为一个多层感知机(MLP),而M随机采样对应的邻居体素特征集S_{l,i}并将其计算为...

where $f_{\text{voxl},j}$ 表示位于层级 $l$ 中的voxel $j$ 特征，并且 $v_{l,j}$ 表示层级 $l$ 中voxel $j$ 的坐标；同时 $r$ 代表相邻半径。该操作在金字塔的所有层级上执行以生成结果

For both the raw input data and the BEV feature map, we conduct corresponding processing steps to obtain aggregated keypoint features.

Finally, we apply a deep learning model of MLP to aggregate the keypoint features and generate the final keypoint feature vectors as

Global Descriptor

The system employs the NetVLAD layer to translate an (N x D)-dimensional FRP set into a (K x D)-dimensional vector V(FRP) by learning K cluster centers {c1,…,cK}, where each ck ∈ R^D.
NetVLAD represents an enhancement over the K-means algorithm in terms of its depth enhancement capabilities.
A soft assignment is defined as

wherein wk ∈ ℝ^D and bk ∈ ℝ are learnable parameters comprising weights and biases. In practice, ak(f rP i) denotes the probability of assigning the feature vector f rP i to the centroid ck.

The final NetVLAD descriptor V(FRP) = [V₁(FRP), …, V_K(FRP)] represents a comprehensive fusion of features extracted from each frame, achieved through an innovative integration of the original VLAD formulation with a refined soft assignment mechanism based on Eq. (上式).

我们采用了一个简单的多层感知机（MLP），将K×D维空间中的V(FRP)向量压缩为G维的简洁描述符。接着，在MLP输出的基础上运用Context Gating（CG）模块生成最终的整体描述符f§∈R^G。最后发现，在MLP输出中引入自注意力机制进行加权是关键步骤

Denoted as $X$ , where $X$ is the MLP output. The $\sigma$ operator represents an element-wise sigmoid operation. The element-wise multiplication operation represents a mathematical operation. The weights $W$ and biases $b$ represent parameters of the MLP.

相对位姿估计

employ the Sinkhorn algorithm
The discrete Kantorovich formulation for the optimal transport problem is mathematically defined as follows.

Where Ci,j denotes the cost of matching the ith element in set P with the jth element in set S. To utilize the Sinkhorn algorithm, we incorporate an entropic regularization element.

where λ represents a parameter that regulates the sparsity of the mapping mechanism (as λ approaches zero, T tends toward a bijective mapping)
However, both Equations (9) and (10) are contingent upon A being a doubly stochastic matrix under the condition that conservation of mass holds true; that is, each point within FRP must be associated with one or more points in FRS and reciprocally.
One potential approach involves reframing the problem as unbalanced optimal transport (UOT), which permits creation or annihilation of mass, thereby relaxing the stringent constraints imposed by balanced optimal transport. This framework is mathematically defined as

where $KL$ denotes the Kullback-Leibler (KL)散度, $U$ represents the discrete uniform distribution, and $\rho$ serves as a parameter governing the degree of mass preservation.

We define the cost matrix C using a cosine-based distance measure between the keypoint descriptors.

Upon estimating the imbalanced optimal transport T, we determine the projected coordinates for each keypoint pj in set P, specifically the corresponding projected positions within manifold S.

Apply weighted Singular Value Decomposition to calculate the rigid body transformation between the original point cloud P and its projected counterpart within set S.

Since Algorithm 1 and Singular Value Decomposition (SVD) are differentiable, we train our relative pose module in a fully connected manner by evaluating the disparity between the predicted transformation $H_{\beta}^{SP}$ and the ground truth transformation $H_{\text{SP}}$ .

Loss Function

By employing the triplet loss, we develop our global descriptors. The anchor point cloud Pa is accompanied by a positive sample, which is defined as the point cloud representing the same location, and a negative sample, representing points from distinct locations.

where $d(\cdot)$ represents a distance function, $m$ denotes the target separation margin, and $[x]^+$ symbolizes the positive part of $x$ .

The training process of the model is carried out through comparing the anchor point cloud P_a = {p_{a1}, …, p_{aJ}} that has undergone transformation via predicted transformation H_bp,a with its corresponding groundtruth transformation H_pa.

We incorporate an auxiliary loss term into the matching estimates derived from the unbalanced optimal transport T:

我们还展示了我们的所提出的最优运输辅助损失在图1所示的方程中提出后能够显著地帮助网络学习更加判别的特征。

The end type of loss function is the linear combination of the three mentioned components.

where β is loss balancing term

其他引用技术

Triplet Loss 损失函数

用于训练具有相近特性的样本例如人脸等数据集包含锚点（Anchor Point）示例正向（Positive）反向（Negative）以及反向实例通过优化锚点与正向实例之间的距离使其小于其与反向实例之间的距离以实现样本间相似度的评估

Triplet Loss损失函数的计算公式如下：

Sinkhorn算法

源自

设变量r表示每位顾客获得的小吃数量，则有 $r = [3,\ 3,\ 3,\ 4,\ 2,\ 2,\ 2,\ 1]^{\top}$ 。其中向量r的空间维数是8，在后续分析中我们将其记作n。
变量c被定义为各类小吃的总份数，则有 $c = [4,\ 2,\ 6,\ 4,\ 4]^{\top}$ 。其中向量c的空间维数是5，在此我们将其标记为m。
一般而言，在统计学中我们用边缘概率分布（即从一个概率空间转换到另一个的概率空间）来描述不同随机事件之间的关系；因此在本研究中 $r$ 与 $c$ 作为边缘分布时其元素之和必须等于1；这在后续计算中会得到应用。

定义

U(r,c)涵盖了全部可能的资源分配方案。特别说明一下，在这里每份小吃是可以被任意分割地分配的。

每个用户对各小吃的喜爱程度存储在矩阵中

（矩阵M也可称为代价矩阵，在此例中对矩阵中的元素执行取反运算即可得到其对应的代价矩阵）最终问题可清晰地表示为, $...$

其中

又被称为Wasserstein距离。此外，在这种情况下还可以引入正则项以改善问题描述的合理性。即，在这种情况下可以引入正则项以改善问题描述的合理性。

也被称作Sinkhorn distance。在此案例中，则是为了最大限度地分配每个人最喜欢的餐点，并且尽力实现均衡分配。（此处略去对具体含义的阐述）

解决方法：Sinkhorn
上述问题最优解可表示为,

其中

为要求解的常数，具体的，Sinkhorn伪代码为

全部评论 (0)

还没有任何评论哟~

LCDNet: Deep Loop Closure Detection and Point Cloud Registration for LiDAR SLAM

动机为解决SLAM里面回环检测问题，基于LIDAR为主sensor的网络：抛弃了传统的词袋模型，直接通过NN提取描述子，同时以OT理论为基础做数据相似性判断；另外，通过NN估计两帧间的R/t变换阵 ...

【论文笔记】LiDAR Iris for Loop-Closure Detection

【论文笔记】LiDARIrisforLoopClosureDetection 在本文中，提出了一种称为LiDARIris的LiDAR点云全局描述符，用于快速准确的闭环检测。在对LiDARIris图像表...

论文笔记—LiPMatch: LiDAR Point Cloud Plane Based Loop-Closure

论文笔记—LiPMatch:LiDARPointCloudPlaneBasedLoopClosure 文章摘要摘要—本文提出了一种基于点云的闭环方法，用于校正基于光检测和测距的同时定位和建图系统中的...

论文笔记-Loop Closure Detection for Visual SLAM Systems Using Deep Neural Networks

XiangGao,TaoZhang，LoopClosureDetectionforVisualSLAMSystemsUsingDeepNeuralNetworks,2015 XiangGao,TaoZ...

论文解读《M2DP: A Novel 3D Point Cloud Descriptor and Its Application in Loop Closure Detection》

原文链接 M2DP:ANovel3DPointCloudDescriptorandItsApplicationinLoopClosureDetection 概述本文提出一种新的三维点云全局描述子——...

【论文笔记】SA-LOAM:Semantic-aided LiDAR SLAM with Loop Closure

【论文笔记】SALOAM:SemanticaidedLiDARSLAMwithLoopClosure 基于LiDAR的SLAM系统无疑比其他系统更准确和稳定，但其回环检测仍然是一个悬而未决的问题。随着...

Cartographer论文精读-Real-Time Loop Closure in 2D LIDAR SLAM

RealTimeLoopClosurein2DLIDARSLAM 这篇论文是16年5月发表的，虽然时间比较久，但是很经典，对于了解整个cartographer代码有很大帮助。第一次看这个论文的时候觉得...

DeepI2P: Image-to-Point Cloud Registration via Deep Classification

文章地址<https://openaccess.thecvf.com/content/CVPR2021/papers/LiDeepI2PImagetoPointCloudRegistrationvia...

论文笔记—Voxelized GICP for Fast and Accurate 3D Point Cloud Registration

论文笔记—VoxelizedGICPforFastandAccurate3DPointCloudRegistration 文章摘要本文提出了体素化广义迭代最近点（VGICP）算法，用于快速准确的三维...

配准Geometric Transformer for Fast and Robust Point Cloud Registration（7-55CVPR-超点）

之前研究存在的问题、论文的出发点：全局上下文特征对于配准很重要，Transformer被用于提取全局上下文特征，vanillaTransformer忽略了点云的几何结构，只利用高级点云特征，这使得学...

是否确定退出登录?

LCDNet: Deep Loop Closure Detection and Point Cloud Registration for LiDAR SLAM

动机

总体网络架构

特征提取

Global Descriptor

相对位姿估计

Loss Function

其他引用技术

全部评论 (0)

相关文章推荐

LCDNet: Deep Loop Closure Detection and Point Cloud Registration for LiDAR SLAM

【论文笔记】LiDAR Iris for Loop-Closure Detection

论文笔记—LiPMatch: LiDAR Point Cloud Plane Based Loop-Closure

论文笔记-Loop Closure Detection for Visual SLAM Systems Using Deep Neural Networks

论文解读《M2DP: A Novel 3D Point Cloud Descriptor and Its Application in Loop Closure Detection》

【论文笔记】SA-LOAM:Semantic-aided LiDAR SLAM with Loop Closure

Cartographer论文精读-Real-Time Loop Closure in 2D LIDAR SLAM

DeepI2P: Image-to-Point Cloud Registration via Deep Classification

论文笔记—Voxelized GICP for Fast and Accurate 3D Point Cloud Registration

配准Geometric Transformer for Fast and Robust Point Cloud Registration（7-55CVPR-超点）