Mip-Splatting: Alias-free 3D Gaussian Splatting
Abstract
Recently, 3D Gaussian Splatting has demonstrated impressive novel view synthesis results, reaching high fidelity and efficiency. However, strong artifacts can be observed when changing the sampling rate (e.g., by changing focal length or camera distance). We find that the source for this phenomenon can be attributed to _thelack of _3D frequency constraints and the usage of a 2D dilation filter.
To address this problem, we introduce a3D smoothing filter which constrains the size of the 3D Gaussian primitives based on the maximal sampling frequency induced by the input views, eliminating high-frequency artifacts when zooming in. Moreover, replacing 2D dilation with a _ 2D Mip filter, which simulates a 2D box filter , effectively mitigates aliasing and dilation issues._
Our evaluation, including scenarios such a training on single-scale images and testing on multiple scales , validates the effectiveness of our approach.
Figure
Figure 1

(a) 3DGS renders images by representing 3D Objects as 3D Gaussians which are projected onto the image plane followed by 2D Dilation in screen space.
(b) The method’s intrinsicshrinkage bias leads to degenerate 3D Gaussians exceedsampling limit while rendering similarly to 2D due to the dilation operation.
However, when changing the sampling rate (via the focal length or camera distance), we observe strong dilation effects(c) and high frequency artifacts (d).
Figure 2

We trained all the models on single-scale images and rendered images with different resolutions by changing focal length. While all methods show similar performance at training scale, we observe strong artifacts in previous work when changing the sampling rate. By contrast, our Mip-Splatting renders faithful images across different scales.
Figure 3

Sampling limits. A pixel corresponds to sampling interval 
 . We band-limit the 3D Gaussians by the maximal sampling rate (i.e., minimal sampling interval) among all observations. This example shows 5 cameras at different depths 
 and with different focal lengths 
. Here, camera 3 determines the minimal 
 and hence the maximal sampling rate 
.
Figure 4

Single-scale Training and Multi-scale Testing on the Blender Dataset. All methods are trained at full resolution and evaluated at different (smaller) resolutions to mimic zoom-out.
Methods based on 3DGS capture fine details better than Mip-NeRF and Tri-MipRF at training resolution. Mip-Splatting surpasses both 3DGS and 3DGS + EWA at lower resolutions.
Figure 5

Single-scale Training and Multi-scale Testing on the Mip-NeRF 360 Dataset. All models are trained on images downsampled by a factor of eight and rendered at full resolution to demonstrate zoom-in/moving closer effects.
In contrast to prior work, Mip-Splatting renders images that closely approximate ground truth. Please also note the high-frequency artifacts of 3DGS + EWA.
Figure 6

Single-scale Training and Multi-scale Testing on the Mip-NeRF 360 Dataset. All models are trained on images downsampled by a factor of 8 and rendered at full resolution to demonstrate zoom-in/moving closer effects.
Removing the 3D smoothing filter results in high-frequency artifacts.
Mip-Splatting renders images that closely approximate ground truth. Zoom in for a better view.
Figure 7

Single-scale Training and Multi-scale Testing on the Mip-NeRF 360 Dataset. All methods are trained at 1× resolution and evaluated at different resolutions to mimic zoom-out (1/4× and 1/2×) and zoom-in (2× and 4×). Mip-Splatting surpasses both 3DGS [18] and 3DGS + EWA [59] across different resolutions.
Removing 3D smoothing filter leads to high-frquency artifacts when zooming in, while omitting2D Mip filter results in aliasing artifacts when zooming out.
Limitations
Our method employs a Gaussian filter as an approximation to a box filter for efficiency. However, this approximation introduces errors , particularly when the Gaussian is small in screen space. This issue correlates with our experimental findings, where increased zooming out leads to larger errors.
Additionally, there is a slight increase in training overhead as the sampling rate for each 3D Gaussian must be calculated every m = 100 iterations. Currently, this computation is performed using PyTorch and a more efficient CUDA implementation could potentially reduce this overhead.
Designing a better data structure for precomputing and storing the sampling rate, as it depends solely on the camera poses and intrinsics,is an avenue forfuture work. As mentioned before, the sampling rate computation is the only prerequisite during training and the 3D smoothing filter can be fused with the Gaussian primitives, thereby eliminating any additional overhead during rendering.
Conclusion
We presented Mip-Splatting, a modification to 3D Gaussian Splatting, which introduces two novel filters, namely a 3D smoothing filter and a 2D Mip filter , to achieve alias-free rendering at arbitrary scales.
Our3D smoothing filter effectively limits the maximal frequency of Gaussian primitives to match the sampling constraints imposed by the training images, while the 2D Mip filter approximates the box filter to simulate the physical imaging process.
Our experimental results demonstrate that Mip-Splatting is competitive with state-of-the-art methods in terms of performance when training and testing at the same scale / sampling rate. Importantly, it significantly outperforms state-of-the-art methods in out-of-distribution scenarios, when testing at sampling rates different from training, resulting in better generalization to out-of-distribution camera poses and zoom factors.
