【读论文】Neural Scene Graphs for Dynamic Scenes

阅读量：

Neural Scene Graphs for Dynamic Scenes

文章目录

Neural Scene Graphs for Dynamic Scenes
- 1. What
- 2. Why
- 3. How
- - 3.1 Neural Scene Graphs
  - 3.2 Representation Models
  - 3.3 Neural Scene Graph Rendering
  - 3.4 3D Object Detection as Inverse Rendering
- 4. Self-thoughts

1. What

What kind of thing is this article going to do (from the abstract and conclusion, try to summarize it in one sentence)

Using video and annotated tracking data, this paper composes dynamic, multi-object scenes into a learned scene graph, which can also be used for 3D object detection via inverse rendering.

2. Why

Under what conditions or needs this research plan was proposed (Intro), what problems/deficiencies should be solved at the core, what others have done, and what are the innovation points? (From Introduction and related work)

Maybe contain Background, Question, Others, Innovation:

Traditional pipelines containing point clouds allow for learning hierarchical scene representations but can’t handle highly view-dependent features.

Nerf resolves the view-dependent effect but does not allow for hierarchical representations or dynamic scenes.

NeRF-W has some tries, it incorporates an appearance embedding and a decomposition of transient and static elements via uncertainty fields but still relies on the consistency of the static scene.

Related works mentioned:

Implicit Scene Representations: Existing methods have been proposed to learn features on discrete geometric primitives, such as points, meshed, and multi-planes.
Neural Rendering: Differentiable rendering functions have made it possible
to learn scene representations. NeRF stands out because it outputs a color value conditioned on a ray’s direction.
Scene Graph Representations: Model a scene as a directed graph which represents objects as leaf nodes.
Latent Class Encoding: By adding a latent vector z to the input 3D query point, similar objects can be modeled using the same network.

We will introduce part3 and 4 in more detail later.

3. How

3.1 Neural Scene Graphs

The first thing for scene reconstruction is to model the scene in a specific way. That is what we will introduce first.
请添加图片描述

On the left side(a), there’s an “isometric view” of a “Neural scene graph.” This graph represents different elements within a scene as nodes and their spatial relationships as edges.

Each node is associated with a transformation (rotation and translation) and scaling, denoted as $T^w_i$ and $S_i$ , indicating how each node (object) is oriented and scaled within the world coordinate system $W$ . The nodes are visualized as colored boxes, with edges indicating the relationships between them like the positions of objects relative to each other or the world frame $W$ . The objects have latent object codes like $l_1,l_2$ suggesting they represent specific objects like cars and trucks. There’s also a background node $F_{bckg}$ and different class nodes like $F_{\theta_{car}}$ and $F_{\theta_{truck}}$ .

To sum up, we can define it as a directed acyclic graph:

${\mathcal S}=\langle{\mathcal W},C,F,L,E\rangle.$

where to supply that C is a leaf node representing the camera and E is the edge, representing affine transformations from $u$ to $v$ (relationship) or property assignments.

3.2 Representation Models

Static and dynamic scene representations are different.
For the static scene, it is the same as the original NeRF, which uses position $(x, y, z)$ and direction $(d_x,d_y,d_z)$ as input, and uses color $(c)$ and density $(\sigma)$ as output. We can summarize the process as:

$\begin{aligned}[\sigma(\boldsymbol{x}),\boldsymbol{y}(\boldsymbol{x})]& =F_{\theta_{bckg,1}}(\gamma_{x}(\boldsymbol{x})) \\ \mathbf{c}(\boldsymbol{x})& =F_{\theta_{bckg,2}}(\gamma_{d}(\boldsymbol{d}),\boldsymbol{y}(\boldsymbol{x})). \end{aligned}$

For the dynamic scene, each object is represented by a neural radiance field.

Meanwhile considering the limit of computation, we introduce a latent vector $l$ encoding an object’s representation. Conditioning on the latent code allows shared weights $\theta_c$ between all objects of class $c$ . Adding $l_o$ to the input of a volumetric scene function $F_{\theta_c}$ can be thought of as a mapping from the representation function of class $c$ to the radiance field of object $o$ .

In the architecture of NN, we add this 256-dimensional latent vector $l_o$ , resulting in the following new first stage:

$[y(x),\sigma(x)]=F_{\theta_{c,1}}(\gamma_{\boldsymbol{x}}(\boldsymbol{x}),\boldsymbol{l}_{o}).$

Because in the video, the dynamic object will change over time, to consider the location-dependent effects, we add the global position $p_o$ in the global frame as another input, that is:

$c(x,l_o,p_o)=F_{\theta_{c,2}}(\gamma_d(d),y(x,l_o),p_o).$

Notice the $x$ inside this formulation was in the local coordinate, after a certain transformation and normalization:

$x_o=S_oT_o^wx\text{ with }x_o\in[1,-1].$

We need to query its color and volume density in the local object coordinate system. When a ray passes through the global coordinate system, we need to convert it to the local coordinate system, which will be reflected in the rendering part later.

So all in all, the representation of a dynamic scene is:

$F_{\theta_c}:(\boldsymbol{x}_o,\boldsymbol{d}_o,\boldsymbol{p}_o,\boldsymbol{l}_o)\rightarrow(\boldsymbol{c},\boldsymbol{\sigma});\forall\boldsymbol{x}_o\in[-1,1].$

3.3 Neural Scene Graph Rendering

Each ray $r_j$ traced through the scene is discretized at $N_d$ sampling points at each of the $m_j$ dynamic node intersections and $N_s$ in the background like the original NeRF, resulting in a set of quadrature points $\{\{t_{i}\}_{i=1}^{N_{s}+m_{j}N_{d}}\}_{j}$

When testing whether the ray passes the dynamic objects, it will use the Ray-box Intersection and then transform this ray to the objects’ local coordinates to query the property. The calculation method is the same as NeRF:

$\begin{aligned}\hat{C}(\boldsymbol{r})&=\sum_{i=1}^{N_s+m_jN_d}T_i\alpha_ic_i,\text{where}\\ T_i&=\exp(-\sum_{k=1}^{i-1}\sigma_k\delta_k) \text{and} \alpha_i=1-\exp(-\sigma_i\delta_i)\end{aligned}$

Finally, the loss function is:

$\mathcal{L}=\sum_{r\in\mathbb{R}}\|\hat{C}(\boldsymbol{r})-C(\boldsymbol{r})\|_{2}^{2}+\frac{1}{\sigma^{2}}\|\boldsymbol{z}\|_{2}^{2},$

which also uses the latent code $p(z_o)$ like DeepSDF.

3.4 3D Object Detection as Inverse Rendering

Just like R-CNN (【计算机视觉】24-Object Detection-博客), it samples anchor positions in a bird’s-eye view plane and optimizes over anchor box positions and latent object codes that minimize the $\ell_{1}$ image loss between the synthesized image and an observed image.

4. Self-thoughts

How to handle shadows on merge scene graphs.
Have no idea how to improve it.

全部评论 (0)

还没有任何评论哟~

【读论文】Neural Scene Graphs for Dynamic Scenes

NeuralSceneGraphsforDynamicScenes 文章目录 NeuralSceneGraphsforDynamicScenes 1\.What 2\.Why 3\.How 3.1Ne...

Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring论文阅读

DeepMultiscaleConvolutionalNeuralNetworkforDynamicSceneDeblurring 1\.论文的研究目标与实际问题 2\.论文的创新方法、模型与公式 2...

【论文解读】DynGEM: Deep Embedding Method for Dynamic Graphs

DynGEM发表于2018年，是一种结构保留的动态网络表征学习算法，用于生成动态graph的稳定嵌入。论文链接：<https://arxiv.xilesou.top/pdf/1805.11273.p...

Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks论文阅读

DynamicSceneDeblurringUsingSpatiallyVariantRecurrentNeuralNetworks 1\.论文的研究目标与意义 1.1研究目标 1.2实际意义 2\....

【读论文】SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes

SCGS:SparseControlledGaussianSplattingforEditableDynamicScenes 文章目录 SCGS:SparseControlledGaussianSpl...

【读论文】SUDS: Scalable Urban Dynamic Scenes

文章目录 1\.What 2\.Why 3\.How 3.1Inputs 3.2Representation 3.2.1ScenecompositionandHashtables 3.2.2Fromb...

读论文《Temporal Graph Networks for Deep Learning on Dynamic Graphs》

论文地址：[[2006.10637]TemporalGraphNetworksforDeepLearningonDynamicGraphsarxiv.org]https://arxiv.org/abs...

论文阅读：Spatial-Temporal Transformer for Dynamic Scene Graph Generation

SpatialTemporalTransformerforDynamicSceneGraphGeneration 论文地址：<https://arxiv.org/abs/2107.12309 gith...

论文阅读 TEMPORAL GRAPH NETWORKS FOR DEEP LEARNING ON DYNAMIC GRAPHS

🚀优质资源分享🚀 学习路线指引（点击解锁）知识定位人群定位 🧡Python实战微信订餐小程序🧡进阶级本课程是pythonflask+微信小程序的完美结合，从项目搭建到腾讯云部署上线，打造一个全...

【读论文】【速读】4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

文章目录 1\.What 2\.Preliminary 3\.What 3.1SpatialTemporalStructureEncoder 3.2MultiheadGaussianDeformati...

是否确定退出登录?

【读论文】Neural Scene Graphs for Dynamic Scenes

Neural Scene Graphs for Dynamic Scenes

文章目录

1. What

2. Why

3. How

3.1 Neural Scene Graphs

3.2 Representation Models

3.3 Neural Scene Graph Rendering

3.4 3D Object Detection as Inverse Rendering

4. Self-thoughts

全部评论 (0)

相关文章推荐

【读论文】Neural Scene Graphs for Dynamic Scenes

Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring论文阅读

【论文解读】DynGEM: Deep Embedding Method for Dynamic Graphs

Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks论文阅读

【读论文】SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes

【读论文】SUDS: Scalable Urban Dynamic Scenes

读论文《Temporal Graph Networks for Deep Learning on Dynamic Graphs》

论文阅读：Spatial-Temporal Transformer for Dynamic Scene Graph Generation

论文阅读 TEMPORAL GRAPH NETWORKS FOR DEEP LEARNING ON DYNAMIC GRAPHS

【读论文】【速读】4D Gaussian Splatting for Real-Time Dynamic Scene Rendering