Visual comparisons for novel-view synthesis. Compared with existing state-of-the-art 4D reconstruction methods, Deblur4DGS produces photo-realistc results from blurry monocular video while maintaing real-time rendering speed.
Abstract
Recent 4D reconstruction methods have yielded impressive results but rely on sharp videos as supervision. However, motion blur often occurs in videos due to camera shake and object movement, while existing methods render blurry results when using such videos for reconstructing 4D models. Although a few approaches attempted to address the problem, they struggled to produce high-quality results, due to the inaccuracy in estimating continuous dynamic representations within the exposure time. Encouraged by recent works in 3D motion trajectory modeling using 3D Gaussian Splatting (3DGS), we suggest taking 3DGS as the scene representation manner, and propose Deblur4DGS to reconstruct a high-quality 4D model from blurry monocular video. Specifically, we transform continuous dynamic representations estimation within an exposure time into the exposure time estimation. Moreover, we introduce the exposure
regularization term, multi-frame, and multi-resolution consistency regularization term to avoid trivial solutions. Furthermore, to better represent objects with large motion, we suggest blur-aware variable canonical Gaussians. Beyond novel-view synthesis, Deblur4DGS can be applied to improve blurry video from multiple perspectives, including deblurring, frame interpolation, and video stabilization. Extensive experiments in both synthetic and real-world data on the above four tasks show that Deblur4DGS outperforms state-of-the-art 4D reconstruction methods.
Method
(a) Training of Deblur4DGS.
When processing \(t\)-th frame, we first discretize its exposure time into \(N\) timestamps.
Then, we estimate continuous camera poses \(\{\mathbf{P}_{t,i}\}_{i=1}^{N}\) and dynamic Gaussians \(\{\mathbf{D}_{t,i}\}_{i=1}^{N}\) within exposure time.
Next, we render each latent sharp image \(\hat{\mathbf{I}}_{t,i}\) with the camera pose \(\mathbf{P}_{t,i}\), dynamic Gaussians \(\mathbf{D}_{t,i}\) and static Gaussians \(\mathbf{S}\).
Finally, \(\{\hat{\mathbf{I}}_{t,i}\}_{i=1}^{N}\) are averaged to obtain the synthetic blurry image \(\hat{\mathbf{B}}_{t}\), which is used to calculate the reconstruction loss \(\mathcal{L}_{rec}\) with the given blurry frame \(\mathbf{B}_{t}\).
To regularize the under-constrained optimization, we introduce exposure regularization \(\mathcal{L}_{e}\), multi-frame consistency regularization \(\mathcal{L}_{mfc}\) and multi-resolution consistency regularization \(\mathcal{L}_{mrc}\).
(b) Rendering of Deblur4DGS. Deblur4DGS produces the sharp image with user-provided timestamp \(t\) and camera pose \(\mathbf{P}_{t}\).
Novel-View Synthesis
Deblur4DGS produces more visually pleasant results in both static and dynamic areas, as marked with yellow and red boxes respectively.
Compared with 4D reconstruction based methods, Deblur4DGS produces sharper contents and fewer visual artifacts, in both static and dynamic areas, as marked with yellow and red boxes respectively.