Advertisement

Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering

阅读量:

Abstract

Modeling dynamic, large-scale urban scenes is challenging due to their highly intricate geometric structures and unconstrained dynamics in both space and time. Prior methods often employ high-level architectural priors, separating static and dynamic elements , resulting in suboptimal capture of their synergistic interactions.

To address this challenge, we present a unified representation model, called Periodic Vibration Gaussian(PVG). PVG builds upon the efficient 3DGS, originally designed for static scene representation, by introducing periodic vibration-based temporal dynamics. This innovation enables PVG to elegantly and uniformly represent the characteristics of various objects and elements in dynamic urban scenes.

To enhance temporally coherent and large scene representation learning with sparse training data, we introduce a novel temporal smoothing mechanism and a position-aware adaptive control strategy respectively.

Extensive experiments on Waymo Open Dataset and KITTI benchmarks demonstrate that PVG surpasses state-of-the-art alternatives in both reconstruction and novel view synthesis for both dynamic and static scenes.

Notably, PVG achieves this without relying on manually labeled object bounding boxes or expensive optical flow estimation. Moreover, PVG exhibits 900-fold acceleration in rendering over the best alternative.

Figure

Figure 1

Our proposed Periodic Vibration Gaussian is crafted to effectively and uniformly capture both the static and dynamic elements of a large, dynamic urban scene.

(a) It not only reconstructs a dynamic urban scene but also enables real-time rendering, while efficiently isolating dynamic components from the intricacies of the highly unconstrained and complex scene.

(b) This capability facilitates flexible manipulation, such as the removal of dynamic scene elements.

Figure 2

(a) Dynamic and static can be expressed uniformly through different Periodic Vibration Gaussian points with different lifespan. Our model can be trained to differentiate between dynamic and static on its own.

(b) Our training pipeline.

Figure 3

Our temporal smoothing mechanism.

We query the status of PVG point set at t-elta{t} and translate each point with its 3D flow translation ar{v}dot elta {t} , we further render the translated set of points to do the supervision at timestamp t.

Figure 4

Novel view synthesis of dynamic scenes on Waymo.

Figure 5

Novel view synthesis of dynamic scenes on KITTI.

Figure 6

Novel view synthesis on static scene on Waymo.

Figure 7

PVG vs. EmerNeRF

Figure 8

PVG Rendered RGB, depth and semantic label.

Figure 9

Temporal smoothing (a) on and (b) off.

Figure 10

Dynamics models: (a) Constant, (b) Linear, (c) Ours, with PSNR =27.77 / 27.09 / 28.11, respectively.

Figure 11

More visualization of the ablation study.

(a) Effect of the depth athcal{L}_{d} and sky module.

(b) Effect of temporal smoothing for motion trend (car, pedestrian and shadow) and sparse velocity athcal{L}_{ar{v}} for enabling static component modeling. We render the average velocity map ar{athcal{V}} , and color its uv component by optical flow color coding.

(c) Effect of Positional-Aware Control (PAC) in improving the depth of distant components.

Figure 12

Consider a 4D space-time coordinate system with a non-zero slope fordynamic objects and zero slope for the static.

Every PVG point’s trajectory is characterized by a piecewise sine function with specific domain of definition and amplitude.

PVG points with small staticness coefficient ρ (red points) and short lifespans learn to model dynamic scene parts ,

PVG points with largeρ (green points) and long lifespans for explaining static scene parts.

To represent unconstrained motion (e.g., moving car), a collection of PVG points will work out in a cohort.

Figure 13

Consider two adjacent training frames with timestamp t_1 and t_2 (a small time window).

In a small time period, we assume dynamic objects move linearly. At observed times t_1 and t_2, RGB renderings could fit well. However for the moments in between (t_1 < t_b < t_2), we have no corresponding training data to constrain our model athcal{H}_i.

To prevent our model from behaving improperly, we impose a smooth constraint subject to the slope of v. The frames used to train are knots of a function, what we need to do is making the knots more smoothly connected.

Figure 14

(a) ascene with a left-to-right moving car and two walking pedestrians.

(b) Visualization of the velocity map of the scene.

(c) Visualization of ρ map of the scene.

Figure 15

Rendered images under different camera settings.

It is evident that our model captures the motion, dynamic (including even the car’s shadow) and static parts of the scene. In ρ map, blue/red: large/small ρ pointing to static/dynamic areas.

Figure 16

Qualitative results of novel view synthesis on Waymo. GT: Ground-truth.

Fig. 17: Qualitative results of novel view synthesis on Waymo.

Figure 18

Qualitative results of image reconstruction on KITTI.

Figure 19

Qualitative results of novel view synthesis on KITTI.

Conclusion

We present the Periodic Vibration Gaussian (PVG) , a model adept at capturing the diverse characteristics of various objects and materials within dynamic urban scenes in a unified formulation.

By integrating periodic vibration , time-dependent opacity decay , and a scene flow-based temporal smoothing mechanism into the 3DGS, we have established that our model significantly outperforms the SOTA methods on the Waymo Open Dataset and KITTI benchmark , with significant efficiency advantage in dynamics scene reconstruction and novel view synthesis.

While PVG excels in managing dynamic scenes, it encounters limitations in precise geometric representation , attributable to its highly adaptable design. Future efforts will focus on improving geometric accuracy and further refining the model’s proficiency in accurately depicting the complexities of urban scenes.

Limitation

Unlike neural network based representation models, our PVG uses independent and discrete points to represent a scene. Despite the advantages including high flexibility, simple composition, strong expression ability, and strong fitting ability, the independence of each point makes it more difficult to model the smoothness over time and space. Our temporal smoothing mechanism does enhance the correlation between points, but still not fully solves this problem. More finegrained, dedicated designs are needed for further enhancement.

全部评论 (0)

还没有任何评论哟~