GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
Abstract
3D editing plays a crucial role in many areas such as gaming and virtual reality. Traditional 3D editing methods, which rely on representations like meshes and point clouds , often fall short in realistically depicting complex scenes. On the other hand, methods based on implicit 3D representations, like NeRF, render complex scenes effectively but suffer from slow processing speeds and limited control over specific scene areas.
In response to these challenges, our paper presents GaussianEditor, an innovative and efficient 3D editing algorithm based on 3DGS, a novel 3D representation.
GaussianEditor enhances precision and control in editing through our proposed Gaussian semantic tracing , which traces the editing target throughout the training process.
Additionally, we propose Hierarchical Gaussian splatting (HGS) to achieve stabilized and fine results under stochastic generative guidance from 2D diffusion models.
We also develop editing strategies for efficient object removal and integration , a challenging task for existing methods. Our comprehensive experiments demonstrate GaussianEditor’s superior control, efficacy, and rapid performance, marking a significant advancement in 3D editing.
Figure
Figure 1

Results of GaussianEditor.
GaussianEditor offers swift, controllable, and versatile 3D editing. A single editing session only takes5-10 minutes. Please note our precise editing control, where only the desired parts are modified. Taking the “Make the grass on fire” example from the first row of the figure, other objects in the scene such as the bench and tree remain unaffected.
Figure 2

Illustration of Gaussian semantic tracing "Prompt:Turn him into an old lady ".
The red mask in the images represents the projection of the Gaussians that will be updated and densified. The dynamic change of the masked area during the training process, as driven by the updating of Gaussians, ensures consistent effectiveness throughout the training duration.
Despite starting with potentially inaccurate segmentation masks due to 2D segmentation errors, Gaussian semantic tracing still guarantees high-quality editing results.
Figure 3

3D inpainting for object incorporation.
GaussianEditor is capable of adding objects at specified locations in a scene, given a 2D inpainting mask anda text prompt from a single view. The whole process takes merely five minutes.
Figure 4

3D inpainting for object removal.
Typically, removing the target object based on a Gaussian semantic mask generates artifacts at the interface between the target object and the scene. To address this, we generate a repaired image using a 2D inpainting method andemploy Mean Squared Error (MSE) loss for supervision. The whole process takes merely two minutes.
Figure 5

Qualitative comparison.
It’s important to note the level of control we maintain over the editing area (the whole body of the man).
Background and other non-target regions are essentially unaffected, in contrast to Instruct-Nerf2Nerf where the entire scene undergoes changes.
GaussianEditor-DDS and GaussianEditor-iN2N indicate that we utilize delta denoising score and Instruct-Nerf2Nerf respectively, as guidance for editing.
Figure 6

Extensive Results of GaussianEditor.
Our method is capable of various editing tasks, including face and scene editing. In face and bear editing, we restrict the editing area to the face using Gaussian semantic tracing, ensuring that undesired areas remain unchanged. The leftmost column demonstrates the original view, while the right three columns show the images after editing.
Figure 7

Ablation study on Hierarchical Gaussian Splatting (HGS) "Prompt: make the grass on fire ".
Even when specifying the editing area with prompts, generative methods like InstructPix2Pix tend to edit the entire 2D image.
Without HGS, Gaussians tend to conform to this whole-image editing by spreading and densifying across the entire scene, leading to uncontrollable densification and blurring of the image.
With HGS, however, this kind of diffusion is effectively restrained.
Figure 8

Ablation study on Semantic Tracing. Tracing Prompt: Man As shown in Fig.7, generative guidance techniques such as Instruct-Pix2Pix exhibit a tendency to modify the entire 2D image. With the assistance of semantic tracing, we can confine the editing region to the desired area.
Figure 9

Semantic Tracing with Point-based Prompts.
(a) users provide key points on a view by clicking the screen with the mouse.
(b) we segment the target object based on these points.
(c-d) depict the results after removing the segmented objects. It can be seen from the above that our point-based tracing method offers high precision and interactivity.
Figure 10

Object Incorporation with WebUI.
Empowered by our interactive WebUI, the depth scale addresses the limitation of monocular depth estimation, which cannot guarantee precise depth map predictions. As can be seen in (a), inaccurate results can lead to failures when aligning generated objects with Gaussian scenes. We leverage the interactive nature of our WebUI to dynamically adjust the estimated depth in real-time, thus resolving this issue, demonstrated in other images.
Figure 11

More results of GaussianEditor. GaussianEditor allows for fast, versatile, high-resolution 3D editing, requiring only 2-7 minutes and 10-20GB of GPU memory on a single A6000 GPU. Please note that the background of face editing scenes remains unchanged.
Conclusion
In our research, we introduce GaussianEditor , an innovative 3D editing algorithm based on Gaussian Splatting, designed for enhanced control and efficiency.
Our method employs Gaussian semantic tracing for precise identification and targeting of editing areas, followed by Hierarchical Gaussian Splatting (HGS) to balance fluidity and stability in achieving detailed results under stochastic guidance.
Additionally, we developed a specialized 3D inpainting algorithm for GS, streamlining object removal and integration, and greatly reducing editing time.
Similar to previous 3D editing works based on 2D diffusion models, GaussianEditor relies on these models to provide effective supervision. However, current 2D diffusion models struggle to offer effective guidance for certain complex prompts, leading to limitations in 3D editing.
