Advertisement

Gaussian Splatting in Style

阅读量:

Abstract

3D scene stylization extends the work of neural style transfer to 3D. A vital challenge in this problem is to maintain the uniformity of the stylized appearance across multiple views. A vast majority of the previous works achieve this by training a 3D model for every stylized image and a set of multi-view images.

In contrast, we propose a novel architecture trained on a collection of style images that, at test time, produces real time high-quality stylized novel views.

We choose the underlying 3D scene representation for our model as 3D Gaussian splatting. We take the 3D Gaussians and process them using amulti-resolution hash grid and a tiny ML P to obtain stylized views. The MLP is conditioned on different style codes for generalization to different styles during test time.

The explicit nature of 3D Gaussians gives us inherent advantages over NeRF-based methods, including geometric consistency and a fast training and rendering regime. This enables our method to be useful for various practical use cases, such as augmented or virtual reality.

We demonstrate that our method achieves state-of-the-art performance with superior visual quality on various indoor and outdoor real-world data.

Figure

Figure 1

Given multi-view images of a real-world scene, we perform the task of scene stylization.

We train a network that, provided with style image at test time , generates stylized novel views of the scene conditioned on the input style in real-time that are consistent in 3D space. In contrary to popular scene stylization approaches of fitting a scene with each new style, we learn this mapping via a neural network and therefore, can generate novel views of the scene even for unseen styles at a rate of roughly 150FPS.

Figure 2

The motivation from our work stems from the requirement of a specialized method that while stylizing a scene considers the spatial information into account.

We show that to generate stylized novel views of a scene, it is not sufficient to stylize the rendered views or train a scene representation model on stylized 2D images. It leads to loss of information such as deformity in solid truck’s body shown above.

Figure 3

Here we diagramatically show the overall architecture of our pipeline.

We employ a novel 3D Color module that is jointly trained with the 3D Gaussians, to predict the new colors for each Gaussian based on the querying style image at test time.

Figure 4

Qualitative comparison of our method against the baselines on TnT and LLFF datasets.

Our method achieves a highly accurate stylization based on the input style.

ARF obtain a better texture, it is attributed to the fact that it is optimized separately for each style.

StylizedNerf produces images that suffer from over smoothing and blurriness, while StyleRF fails to grasp the accurate style color.

On the other hand, our proposed method is able to retain high details present in the unstyled view while transferring the adequate texture and colors of the style image for both, indoor and outdoor real-world datasets.

Figure 5

We show the effect of deployinga joint-training regime of training the 3D Gaussians in conjunction with the 3D Color module. Having a joint training in an end-to-end fashion helps to preserve key details and geometry in the rendered stylized novel view.

Figure 6

The qualitative results showing the distored geometry when weapply 2D stylization on rendered novel views(GS-Ada).

We also observe that training vanilla 3DGS on 2D Stylized views(Ada-GS) preserves better details than GS-Ada. However, our method provides the most geometrically accurate renderings while being truthful to the queried style.

Figure 7

We interpolate between the latent vectors of the style images.

The four style images are shown at the four corners of the image above and were chosen so they’re highly diverse and accurately depict the differences in the predicted RGBs for each Gaussian from our model.

Figure 8

Here we show additional qualitative ablation s with ARF.

We find that despite its aggressive texture transfer abilities , ARF fails to create consistent texture across the entire surface and totally misses stylizing on far away objects and backgrounds.

GSS on the other hand, is able to perform stylization uniformly and consistently for distant objects and background such as the floor or buildings.

Conclusion

In this paper, we presented a novel method to stylize complex 3D scenes that are spatially consistent.

Contrary to a majority of existing works, once trained, our method is capable to take unseen input scenes at inference time and produce novel viewsin real-time.

By leveraging a multi-resolution hash grid and a tiny MLP, we are able to accurately generate the stylized colors of each 3D Gaussian present in a scene. Since we only do one forward pass through the 3D color module , we are able to generate novel views at around 150FPS. We exhibit that GSS produces superior results by the use of quantitative and qualitative results, thus making GSS suitable for many practical applications.

全部评论 (0)

还没有任何评论哟~