Street Gaussians without 3D Object Tracker

Street Gaussians without 3D Object Tracker ICCV 2025

Tsinghua University¹
National University of Singapore²

Comparison of 3D tracker-based Street Gaussians (left) and our approach (right). Existing methods heavily rely on object poses, but 3D trackers struggle with limited generalization, leading to flaws in novel view synthesis. In contrast, 2D foundation models show better generalization. Our approach leverages a 2D foundation model for object tracking and learns point motion within an implicit feature space to autonomously correct tracking errors, improving robustness across diverse scenes.

Abstract

Realistic scene reconstruction in driving scenarios poses significant challenges due to fast-moving objects. Most existing methods rely on labor-intensive manual labeling of object poses to reconstruct dynamic objects in canonical space and move them based on these poses during rendering. While some approaches attempt to use 3D object trackers to replace manual annotations, the limited generalization of 3D trackers - caused by the scarcity of large-scale 3D datasets - results in inferior reconstructions in real-world settings. In contrast, 2D foundation models demonstrate strong generalization capabilities. To eliminate the reliance on 3D trackers and enhance robustness across diverse environments, we propose a stable object tracking module by leveraging associations from 2D deep trackers within a 3D object fusion strategy. We address inevitable tracking errors by further introducing a motion learning strategy in an implicit feature space that autonomously corrects trajectory errors and recovers missed detections. Experimental results on Waymo-NOTR and KITTI show that our method outperforms existing approaches.

Method Overview

Comparisons

Rendering Results

Editing Results

Citation

@inproceedings{zhang2025street,
    title   = {Street Gaussians without 3D Object Tracker},
    author  = {Ruida Zhang and Chengxi Li and Chenyangguang Zhang and Xingyu Liu and Haili Yuan and Yanyan Li and Xiangyang Ji and Gim Hee Lee},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    year    = {2025},
    note    = {arXiv:2412.05548},
    url     = {https://arxiv.org/abs/2412.05548}
}