StylizedGS: Controllable Stylization for 3D Gaussian Splatting(related work)
Image Style Transfer
Style transferaims to generate synthetic images with the artistic style of given images while preserving content.
neural style transfer methods involves iteratively optimizing the output image using Gram matrix loss and content loss calculated from VGG-Net extracted features.
Subsequent works have explored alternative style loss formulations to enhance semantic consistency and capturehigh-frequency style details such as brushstrokes.
Feed-forward transfer methods, where neural networks are trained to capture style information from the style image and transfer it to the input image in a single forward pass , ensuring faster stylization.
Recent improvements in style loss involve replacing the global Gram matrix with the nearest neighbor feature matrix , improving texture preservation.
Some methods adopt patch matching for image generation, like Fast PatchMatch and PatchMatch , but are limited to specific views.
Combining neural style transfer with novel view synthesis methods without considering 3D geometry may lead to issues like blurriness or view inconsistencies.
3D Gaussian Splatting
3DGS has emerged as a robust approach for real-time radiance field rendering and 3D scene reconstruction.
Recent methods enhance semantic understanding of 3D scenes and enabled efficient text-based editing using pre-trained 2D models.
Despite these advancements, existing 3DGS works lack support for image-based 3D scene stylization that faithfully transfers detailed style features while offering flexible control.
By utilizing the 3DGS representation, this work effectively diminishes training and rendering times for stylization, thereby facilitating interactive perceptual control over stylized scenes.
3D Style Transfer
3D scene style transfer aims to transfer the style to the scene with both style fidelity and multi-view consistency.
With the increasing demand for 3D content, neural style transfer has been expanded to various 3D representations.
Stylization on _meshes _often utilizes differential rendering to propagate style transfer objectives from rendered images to 3D meshes, enabling geometric or texture transfer.
Other works, using point clouds as the 3D proxy, ensure 3D consistency when stylizing novel views. For instance, some works employ featurized 3D point clouds modulated with the style image, followed by a 2D CNN renderer to generate stylized renderings. However, explicit methods’ performance is constrained by the quality of geometric reconstructions , often leading to noticeable artifacts in complex real-world scenes.
Therefore implicit methods such as NeRF have gained considerable attention for their enhanced capacity to represent complex scenes. Numerous NeRF-based stylization networks incorporate image style transfer losses during training or are supervised a mutually learned image stylization network to optimize color-related parameters based on a reference style.
Approaches support both appearance and geometric stylization to mimic the reference style, achieving consistent results in novel-view stylization. However, these methods involve time-consuming optimization and exhibitslow rendering due to expensive random sampling in volume rendering. They also lack user-level flexible and accurate perceptual control for stylization.
While controlling perceptual factors , such as color, stroke size, and spatial aspects , has been extensively explored in image domain style transfer, the application of perceptual control in 3D stylization has not been well utilized.
Some works establish semantic correspondence in transferring style across the entire stylized scene but only limited to spatial control and doesn’t allow users to interactively specify arbitrary regions.
ARF-plus introduce more perceptual controllability into the stylization of radiance fields, yet the demand for enhanced flexibility and personalized, diverse characteristics in 3D stylization remains unmet.
This work achieves rapid stylization within a minute of training, ensuring real-time rendering capabilities. It adeptly captures distinctive details from the style image and preserve recognizable scene content with fidelity. Additionally, we empower users withperceptual control over color, scale, and spatial factors for customized stylization.

