Advertisement

Gaussian Splatting in Style(Related Work)

阅读量:

3D Scene representation

三维场景的构建可通过多种方法实现。其中最常见的技术是基于__点云__的方法。与之相似,在一些研究工作中采用的是基于__体素网格__的技术来表示三维数据结构

改写说明

Neural implicit scene-based methods, which are based on the works that employ signed distance fields (SDFs) and occupancies respectively, have effectively addressed the memory challenges.

Fundamentally, implicit neural representations are capable of generating mesh and surface structures up to arbitrarily high resolutions.

Recent breakthroughs by NeuralAngelo have established an unmatched standard in the field. The system successfully achieved the highest level of detail in creating highly detailed 3D surfaces for extensive 3D environments, demonstrating its capability to reconstruct such intricate structures from multiple input images.

Neural radiance fields, or NeRFs[50], align with a similar direction and represent an implicit method.


Novel View Synthesis

Generating unseen perspectives on an object/scene based on a set of input images is referred to as novel view synthesis.

Classical research focused on generating novel perspectives directly through multi-view image sets. The advent of deep learning made it possible for emerging techniques to facilitate neural network-based novel view synthesis(NVS).

Different approaches each depicted the scene uniquely. A significant milestone in the field of NVS was marked by the introduction of NeRFs.

By modeling the scene as parameters of a multi-layer perceptron (MLP), NeRFs effectively capture its intricate structure. These MLPs receive spatial points and viewing directions as inputs, subsequently computing the density and color values for each queried spatial point.

Despite its success, NeRF encountered challenges that impacted its broader applicability.The slow training process accompanied by significant computational overhead made it impractical for many real-world scenarios. Although NeRF demonstrated exceptional performance when dealing with dense input views that encompassed multiple angles of a scene, its ability to capture fine details was often compromised when tested with distant viewpoints not included in the initial training data.

Subsequent studies have targeted efforts in addressing the NVS problem within a small subset of input view dimensions.

Meanwhile, some works focused on a more generalized approach to shifting away from fitting a MLP to each new scene, while several works have enhanced the original approach in terms of training efficiency and inference speed.

Recently, 3DGS also tackled the task of novel view synthesis in real-time.


Style Transfer and Scene Stylization

Neural style transfer represents a process where one applies a specific artistic style to transform an original picture into another with distinct visual characteristics. This method presents challenges due to its requirement to maintain and convey all essential details from both images.

This problem can also be modeled as being an instance of texture transfer, and was addressed in some works prior to the generalized use of neural networks.

Later, this pioneering work achieved the application of neural networks in style transfer tasks. The research was able to utilize a pretrained VGG model to extract semantic features from images at various hierarchical stages.

Some works minimized a loss function, which was composed of two components: content loss and style loss. These terms quantified the similarity between the generated image and both the content image as well as the style image. While some approaches optimized an objective function for each target and style image individually, this made real-time applications impractical.

This has been enhanced through certain works as real-time solutions, while offering the capability to condition styles in a general manner.

Furthermore, certain studies have incorporated advancements in monocular depth estimation to maintain 3D structural data within the source image.

Additionally, some studies applied 2D style transfer techniques to video sequences by leveraging advancements in methods like optical flow. However, our experiments reveal that naively extending 2D style transfer into 3D scenes leads to visual artifacts including blurriness and inconsistencies across various viewpoints.

The stylization of scenes based on a style image has become prominent, notably following the introduction of AR/VR headsets within mainstream usage.

A method to differentiate among various approaches relies on their approach to scene representation.

Some works work with a point cloud based representation. for instance, project the style features from 2D into 3D and transform them into that of the style image before re-projecting them back into 2D for getting stylized views.

Similarly, a work is capable of performing scene stylization as part of indoor scenes using a ** mesh **.

Recent advancements have significantly attracted the attention of the research community towards radiance fields, which demonstrate the capability to render innovative perspectives with exceptional detail and precision. The high-quality outputs further underscore their potential in various applications. The utilization of such models has led to numerous studies that leverage advanced techniques for realistic scene stylization, particularly in areas requiring photorealistic rendering and artistic expression.

Addressing the challenge involves emphasizing the enhancement of radiance for each distinct style image. Commonly, other techniques resort to this standard framework. Different approaches may incorporate hypernetworks to transfer style attributes onto various perspectives within intricate 3D environments. In contrast, certain studies have developed frameworks that enable generalized scene stylization, such that images can be crafted in their stylized form during inference stages when supplied with specific conditioning data. Our proposed approach also adheres to this paradigm.

An alternative approach in this domain involves exploiting additional data sources such as depth maps. As their main objective is to preserve geometry, they exploit additional data sources such as depth maps. In this study, we utilize a 3DGS framework instead of the conventional radiance field approach.

The other approach involves inputting conditions through text. The application of large language models (LLMs) enables certain works to conduct scene editing and style manipulations in specific scenarios through text-based interfaces.

This work is not comparable to any text-based approaches and exclusively focuses on methods that Condition the scene using solely style images.

While textual data can convey informative and descriptive insights, visual-based conditioning presents significantly greater challenges due to its capacity to supply a variety of detailed information including aspects such as abstraction, complexity of strokes, and other structural features.

全部评论 (0)

还没有任何评论哟~