Advertisement

SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior

阅读量:

Abstract

Novel View Synthesis for street scenes play a critical role in the autonomous driving simulation. The current mainstream technique to achieve it is neural rendering, such as NeRF and 3DGS. Although thrilling progress has been made, when handling street scenes, current methods struggle to maintain rendering quality at the viewpoint that deviates significantly from the training viewpoints. This issue stems from thesparse training views captured by a fixed camera on a moving vehicle.

To tackle this problem, we propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model along with complementary multi-modal data.

Firstly, we fine-tune a Diffusion Model by adding images from adjacent frames as condition, meanwhile exploitingdepth data from LiDAR point clouds to supply additional spatial information.

Then we apply the Diffusion Model to regularize the 3DGS at unseen views during training.

Experimental results validate the effectiveness of our method compared with current SOTA models, and demonstrate its advance in rendering images from broader views.

Figure

Figure 1

(a) To enable free control of ego-vehicle in autonomous driving simulation with novel view synthesis, we propose a method that leverages theprior from a Diffusion Model to provide 3DGS augmented views during training.

(b) Our method preserves photo-realistic rendering quality at viewpoints that are distant from the training views while other approaches produce severe artifacts.

Figure 2

Overview of Our Method.

(a) There are two training stages in the Diffusion Model fine-tuning.

Firstly, the U-Net is fine-tuned by being injected with the patchwise CLIP Image features of reference images concatenated with the CLIP text features of a text prompt.

Secondly, a ControlNet is trained with thedepth of the target image as the control signal.

(b) The fine-tuned Diffusion Model from (a) guides the 3DGS training by providing regularization in pseudo views.

For the sake of simplicity, the VAE encoder and decoder are omitted in the figure.

Figure 3


Qualitative comparisons of novel views rendering on the KITTI360 dataset. ZipNeRF and 3DGS produce artifacts of the blue vehicle in (a) and blurry lane markings in (b), while our method preserves high rendering quality. Our method also fix the hole on the road surface generated by 3DGS.

Figure 4

Qualitative comparisons of novel views rendering on the KITTI dataset.

Figure 5


Qualitative ablation results on different conditions and different finetuning schemes of Diffusion Model. *Target view in (b) is a novel view thus its original image is left blank.

Figure 6

Relationship between the number of sampled pseudo views per iteration and (a). training speed (in iterations per second) , (b). rendering speed (in frames per second (FPS)). When pseudo views are introduced the training speed decreases significantly, while the inference speed is not impacted

Figure 7

An example of how the current methodoverfits the training views, while our method overcomes this problem.

Figure 8

The impact of thestrength of the Diffusion Model’s prior on the generated result. *(c) is a novel view, its original image is rendered by 3DGS

Figure 9




More qualitative results of novel views rendering on the KITTI and KITTI-360 dataset.

Discussion and Conclusion

The integration of Diffusion Model into 3DGS introduces a notable limitation:longer training tim e. It is primarily caused by the time-consuming denoising operation of Diffusion Model.

Fig. 6a shows the correlation between the number of sample pseudo views and the training speed. A substantial decrease in training speed can be observed with an increment from 0 pseudo views (standard 3DGS) to 1. Since our method does not affect the real-time inference ability of 3DGS, as illustrated in Fig. 6b , and yields proved render quality, we temporarily accept the training time and leave improving the training efficiency as our future work.

In conclusion, we present a method aimed at enhancing the capability of free-viewpoint rendering within autonomous driving scenarios. While certain limitations persist , our method has shown proficiency in maintaining high-quality renderings from novel viewpoints, with considerable efficiency in rendering. This allows our method to offer a broader perspective within autonomous driving simulations, enabling the simulation of potentially hazardous corner cases, and thus enhancing the overall safety and reliability of autonomous driving systems.

全部评论 (0)

还没有任何评论哟~