《Enforcing geometric constraints of virtual normal for depth prediction》论文笔记

阅读量：

1. 概述

介绍：这篇文章为深度估计提供了一种新的损失函数形式，在深度估计任务中常见的损失函数多以pixel-wise监督的形式出现，因此这些监督损失函数都是利用像素层面的浅层信息进行监督约束的。这样的浅层次深度信息学习自然不能很好学习到GT深度中的一些细节与结构差异信息，因而这篇文章提出将预测深度通过预先假定的相机内参映射到同一模拟空间尺度（3D空间）上进行比较。这里的比较是通过在模拟空间上进行采样，每个采样包含3个点，并通过文章设定的2个采样准则排除无效采样，从而构建预测深度与GT深度的约束。正因为文章的方法是在3D空间进行采样这样使得文章的方法得到的深度更加符合GT在3D空间的分布，也使得深度更加准确。

将GT、文章方法预测深度、其它方法预测深度在点云和surface normal维度进行比较，见下图：
在这里插入图片描述
可以看到在surface normal和深度表现差异不是很大的情况下，点云的分布差异更大，因而在点云对应的3D空间进行约束能够得到更加准确的深度。

2. 方法设计

2.1 整体pipeline

这里使用的网络是编解码网络构建的深度预测网络，主要的改进点便是其使用文章中提出的virtual mormal损失函数（也就是在3D空间上建立约束），如下图所示：
在这里插入图片描述
在文章《DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data
》中对深度回归所使用到的损失函数进行了梳理，将其大致分为如下3个类别，而文章的方法virtual normal与surface normal共同属于affine-invariant的，见引用论文中的表2：
在这里插入图片描述
这篇文章也将virtual normal与surface normal进行了比较，其指出surface normal的值与local size是存在较大关联的，而且受其影响很大，导致local size的选择会给网络训练带来较大不稳定因素。反观virtual normal使用的3D空间上采样的形式进行约束，其会更加鲁棒和稳定一些。
在这里插入图片描述

2.2 virtual normal损失

virtual normal损失是在3D空间上通过采样的方式选择满足文章条件的点，并且构建对应向量空间，并比较向量空间上的差异，其实现可以参考文章代码：VNL_loss.py。

这里假设焦距为 $f_x,f_y$ ，当前像素点与深度值分别为 $(u_i,v_i),d_i$ ，当前图像的中心点 $(u_0,v_0)$ ，则可以将图像坐标映射到相机坐标下：
$z_i=d_i;\ x_i=\frac{d_i\cdot(u_i-u_0)}{f_x};\ y_i=\frac{d_i(v_i-v_0)}{f_y}$
对于3D空间上的采样点集合 $\mathcal{s}=\{(P_A,P_B,P_C)_o|i=0,\dots,N\}$ 是通过设置2个采样准则的方式实现：不共线准则与远距离准则。

不共线准则：
这里需要满足样本点不再同一条直线上，则对应采样点之间组成的向量满足余弦准则：
$\mathcal{R}_1=\{\alpha\ge\angle(\overrightarrow{P_AP_B},\overrightarrow{P_AP_C})\ge\beta,\ \alpha\ge\angle(\overrightarrow{P_BP_C},\overrightarrow{P_BP_A})\ge\beta|P\in\mathcal{s}\}$
其对应的代码可以参考：

复制代码

    nm = torch.bmm(q_norm.view(m_batchsize * groups, index, 1), q_norm.view(m_batchsize * groups, 1, index))  # 计算两个向量的模的乘积
    energy = torch.bmm(proj_query, proj_key)  # 两个向量乘积
    norm_energy = energy / (nm + 1e-8)  # 计算余弦值
    mask_cos = torch.sum((norm_energy > delta_cos) + (norm_energy < -delta_cos), 1) > 3  # 筛选满足角度要求的采样

远距离准则：
这里为了防止采样的点过于靠近这里使用一个距离阈值作为筛选准则，其筛选准则为：
$\mathcal{R}_2=\{||\overrightarrow{P_kP_m}||\gt\theta|k,m\in[A,B,C],P\in\mathcal{s}\}$
其中的参数 $\theta$ 就是选定的距离阈值，其对应的实现代码参考（delta_diff_*就是对应选择的阈值）：

复制代码

    mask_x = torch.sum(torch.abs(pw_diff[:, :, 0, :]) < delta_diff_x, 2) > 0
    mask_y = torch.sum(torch.abs(pw_diff[:, :, 1, :]) < delta_diff_y, 2) > 0
    mask_z = torch.sum(torch.abs(pw_diff[:, :, 2, :]) < delta_diff_z, 2) > 0

在通过上述方法选择得到对应的采样之后就需要对这些采样组成的向量进行组合描述，文章选择的表达形式为：
$\mathcal{N}=\{n_i=\frac{\overrightarrow{P_{A_i}P_{B_i}}\times\overrightarrow{P_{A_i}P_{C_i}}}{||\overrightarrow{P_{A_i}P_{B_i}}\times\overrightarrow{P_{A_i}P_{C_i}}||}|(P_A,P_B,P_C)_i\in\mathcal{s},i=0,\dots,N\}$

扰动鲁棒性分析：
在深度估计的过程中难免会存在扰动的情况，对应的就是下图中的点 $P^{''}$ ，则这个扰动会给对应表达 $n$ 带来什么样的影响呢？
在这里插入图片描述
对于带来的扰动可以进行分析：
$\angle(n,n^{'})=\angle(\overrightarrow{OP_C},\overrightarrow{OP^{''}})=\arctan\frac{||\overrightarrow{P_CP_C^{''}}||}{||\overrightarrow{OP_C}||}\approx0$
上面的关系满足是因为在上面提到的采样过过程中对采样的距离做了限制，既是：
$||\overrightarrow{P_CP_C^{''}}||\ll||\overrightarrow{OP_C}||$
综上，文章将对应空间上的表达描述为L1损失函数的形式：
$L_{VN}=\frac{1}{N}(\sum_{i=0}^N||n_i^{pred}-n_i^{gt}||_1)$

3. 实验结果

NYUD数据集：
在这里插入图片描述
KITTI数据集：

全部评论 (0)

还没有任何评论哟~

《Enforcing geometric constraints of virtual normal for depth prediction》论文笔记

参考代码：VNLMonocularDepthPrediction 1\.概述介绍：这篇文章为深度估计提供了一种新的损失函数形式，在深度估计任务中常见的损失函数多以pixelwise监督的形式出现，因...

论文笔记-Self-supervised Monocular Depth and VO Learning with Scale-consistent Geometric Constraints

论文信息标题：SelfsupervisedMonocularDepthandVisualOdometryLearningwithScaleconsistentGeometricConstraints...

《Structure-Guided Ranking Loss for Single Image Depth Prediction》论文笔记

参考代码：StructureGuidedRankingLoss 1\.概述导读：在这篇文章中提出了一种在监督深度估计方法中的损失函数。该方法是属于pairwiserankingloss族的，文章通过...

论文笔记-Depth Prediction Without the Sensors

论文信息标题：DepthPredictionWithouttheSensors:LeveragingStructureforUnsupervisedLearningfromMonocularVide...

《WSVD：Web Stereo Video Supervision for Depth Prediction from Dynamic Scenes》论文笔记

参考代码：wsvd_test 1. 概述导读：在这篇文章中提出了一种基于光流估计的深度估计网络。该方法首先使用左右双目图像作为输入，并从中估计出光流信息，之后按照估计的光流对图像进行warp，这样...

Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints

基于3d几何约束的无监督深度估计文章发表在CVPR2018，代码貌似未开源摘要此前的无监督的深度估计的方法大多是采用基于图像重建和基于梯度的损失，只考虑了局部像素点之间的关系，并未从整体上对深度...

论文笔记：Exploiting Vector Fields for Geometric Rectification of Distorted Document Images

1广告 [1]ExploitingVectorFieldsforGeometricRectificationofDistortedDocumentImages 利用矢量场对变形文档图像进行几何校正。 ...

《Single Image Depth Prediction with Wavelet Decomposition》论文笔记

参考代码：waveletmonodepth 1\.概述导读：对一幅深度图进行分析可以观察到其是由一些平滑区域和边缘区域组合起来的，对应的可以参考频域中的低频和高频分量。

论文笔记《Physical-Virtual Collaboration Modeling for Intra-and Inter-Station Metro Ridership Prediction》

目录 Abstract Introduction&Methodology 物理图physicalgraph 流量相似性图similaritygraph 流量相关性图correlationgraph M...

论文阅读笔记——StereoNet: Guided Hierarchical Renement for Real-Time Edge-Aware Depth Prediction

引言：谷歌实时端到端双目系统深度学习网络双目匹配可以得到环境中的三维深度信息，进而为机器人，无人车，VR等现实场景下的应用提供有力信息，在对安全验证比较高的人脸支付领域，三维人脸验证也正在逐渐取代...

是否确定退出登录?

《Enforcing geometric constraints of virtual normal for depth prediction》论文笔记

1. 概述

2. 方法设计

2.1 整体pipeline

2.2 virtual normal损失

3. 实验结果

全部评论 (0)

相关文章推荐

《Enforcing geometric constraints of virtual normal for depth prediction》论文笔记

论文笔记-Self-supervised Monocular Depth and VO Learning with Scale-consistent Geometric Constraints

《Structure-Guided Ranking Loss for Single Image Depth Prediction》论文笔记

论文笔记-Depth Prediction Without the Sensors

《WSVD：Web Stereo Video Supervision for Depth Prediction from Dynamic Scenes》论文笔记

Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints

论文笔记：Exploiting Vector Fields for Geometric Rectification of Distorted Document Images

《Single Image Depth Prediction with Wavelet Decomposition》论文笔记

论文笔记《Physical-Virtual Collaboration Modeling for Intra-and Inter-Station Metro Ridership Prediction》

论文阅读笔记——StereoNet: Guided Hierarchical Renement for Real-Time Edge-Aware Depth Prediction