Advertisement

StopThePop: Sorted Gaussian Splatting for View-Consistent Real-time Rendering

阅读量:

Abstract

3DGS has emerged as a prominent model for constructing 3D representations from images across diverse domains. However, the efficiency of the 3DGS rendering pipeline relies on several simplifications. Notably, reducing Gaussian to 2D splats with a single viewspace depth introduces popping and blending artifacts during view rotation. Addressing this issue requires accurate per-pixel depth computation , yet a full per-pixel sort proves excessively costly compared to a global sort operation.

In this paper, we present a novel hierarchical rasterization approach that systematically resorts and culls splats with minimal processing overhead.

Our software rasterizer effectively eliminates popping artifacts and view inconsistencies , as demonstrated through both quantitative and qualitative measurements. Simultaneously, our method mitigates the potential for cheating view-dependent effects with popping , ensuring a more authentic representation. Despite the elimination of cheating, our approach achieves comparable quantitative results for test images, while increasing the consistency for novel view synthesis in motion.

Due to its design, our hierarchical approach is only 4% slower on average than the original Gaussian Splatting. Notably, enforcing consistency enables a reduction in the number of Gaussians by approximately half with nearly identical quality and view-consistency. Consequently, rendering performance is nearly doubled , making our approach 1.6x faster than the 3DGS, with a 50% reduction in memory requirements.

Figure

Figure 1

3DGS suffers from popping artifacts during view rotation due to its approximate, global sorting scheme.

Our method is able to effectively circumvent short-range popping artifacts and long-range view-inconsistencies during rotation with a novel,hierarchical per-pixel sorting strategy.

Figure 2

Effect of collapsing 3D Gaussians into 2D splats and 3DGS’s depth simplification

(a) Integrating Gaussians along view rays r requires careful consideration of potentially overlapping1D Gaussians.

(b) Using flattened 2D splats and view-spaceZ as depth (projection ofu onto V) puts 2D splats on spherical segments around the camera, inverting the relative positions of the two Gaussians along the example view ray.

**(c)**Camera rotation inverts the order along r, resulting in popping.

(d) Camera translation does not alter the distance compared to (b).

Figure 3

Our approach to compute t_{opt} avoids popping by placing splats atthe point of maximum contribution along the view ray r, creating sort orders independent of camera rotation (red view vector).

Note that the shape of t_{opt} is a curved surface and changes with the camera position.

Figure 4


Correct rendering of a trained 3DGS scene withper-pixel sorting reveals how 3DGS cheats with the location of Gaussians. On the other hand, our approach considers correct sorting during training and rendering.

we show the sort error of different resorting windows and our full approach.

We intentionally use the trained 3DGS model here, as our trained version does not show these kinds of artifacts for visualization.

The error visualization captures the sum over the depth difference of all wrongly sorted neighbors.

For resorting with a window size of 4, tile artifacts are still visible. Our approach hardly diverges from fully sorted rendering, while running 100× faster ; it is also about 5× faster than resort 24 and on average only 4% slower than 3DGS.

Figure 5

Comparison of 3DGS with and withoutper-tile depth calculation.

Per-tile depth calculation lowers sorting errors (elta _{max} = 4.01, elta _{avg} = 0.284 compared to elta _{max} = 5.43, elta _{avg} = 0.898). However, doing this without additional per-pixel sorting leads to artifacts at the tile borders.

Figure 6

Number of Gaussians per tile with and without tile-based culling. The average number of Gaussians per tile is reduced by∼44%.

Figure 7

Overview of the detailed steps in our pipeline.

We add** load balancing, tile culling and per-tile depth evaluation** to the first two stages of 3DGS.

Our hierarchical rasterizer utilizes three sorted queues , going from 4×4 tiles over2×2 tiles to individual rays.

The queues store only id and the tile’s t_{opt} per Gaussian, while additional information is re-fetched from global memory on demand , and shared between threads via shuffle operations.

Depending on the queue filllevels , we switch between different cooperative group sizes while ensuring the queues remain filled for effective sorting.

Our pipeline achieves an overall sorting window of 25-72 elements.

Figure 8

Visualization of our proposed popping detection method with detailed views inset.

We warp view F_{i} to at{F}_{i+1} , at{F}_{i+7} using optical flow and use F LIP to measure errors between warped and non-warped views.

**F LIP_{1}**is able to effectively detect popping artifacts, but the obtained errors are only accumulated over a single frame.

F LIP_{7} is able to accumulate errors due to popping over multiple frames, making this metric more reliable.

Figure 9

Comparison between F LIP and MSE to measure differences between rendered frames F_{i+1} and warped frames at{F}_{i+1} for 3DGS.

Notably, using MSE does not yield large errors even when disturbing popping artefacts are encountered.

Figure 10

Image comparisons of our method and 3DGS. In most configurations, our rendered images are virtually indistinguishable from 3DGS.

Figure 11

Per-frame F LIP𝑡 scores for 𝑡 ∈ {1, 7} for a complete video sequence from the Garden scene. Popping in 3DGS causes significant peaks , as can be seen in the results for F LIP_{1}.

Figure 12

3DGS can fake view-dependent effects with popping.

We slightly rotate test set views, and 3DGS’s results are significantly less consistent compared to our results. We increase contrast for zoomed-in views and include a F LIP view for a better comparison.

Figure 13

Average per-scene user study score.

A positive score indicates a preference for our method, whereas a negative score indicates a preference for 3DGS. Our method clearly outperforms 3DGS.

CONCLUSION, LIMITATIONS, AND FUTURE WORK

In this paper, we took a closer look at the way 3DGS orders splats during blending. A detailed analysis of the splat’s depth computation revealed the reason for popping artifacts of 3DGS: the computed depth is highly inconsistent under rotation.

A per-ray depth computation which considers the highest contribution along the ray as optimal blending depth, removes all popping artifacts but is 100× more costly. With our hierarchical renderer, which includes multiple culling and resorting stages, we are only 1.04× slower than 3DGS on average.

While it is difficult to identify popping in standard quality metrics, we provided aview-consistency metric based on optical flow and F LIP, which shows that our approach significantly reduces popping.

We could also confirm this fact in a user study and provided an additional metric confirming increased view-consistency and more accurate depth estimates for our method. Furthermore, our approach remains view-consistent even when constructing the scene with half the Gaussians; for which 3DGS shows a significant increase in popping artifacts. As such, our approach can reduce memory by 2× and rendertimes by 1.6× compared to 3DGS in this configuration, while reducing popping artifacts and achieving virtually indistinguishable quality.

While our approach typically removes all artifacts in our tests, resorting does not guarantee the right blend order , and thus could still lead to popping or flickering for very complex geometric relationships.

Furthermore, our approach still ignoresoverlaps between Gaussians along the view ray.A fully correct volume rendering of Gaussians may not only remove artifacts completely but could lead to better scene reconstructions.

全部评论 (0)

还没有任何评论哟~