Accurate 3D object detection from cameras alone remains a fundamental challenge in autonomous driving, particularly
for precise localization and velocity estimation, two metrics critical for safe trajectory planning and collision avoidance.
Existing camera-based methods lift image features into dense Bird's-Eye View (BEV) grids, which struggle to capture
fine-grained geometry and motion cues.
We present GaussianDet3D, the first method, to the best of our knowledge, to apply 3D Gaussian Splatting from
multi-view images to 3D object detection in the context of autonomous driving, treating predicted Gaussian primitives as
a pseudo-LiDAR point cloud fed directly into a sparse LiDAR detector. Unlike a LiDAR point which carries only
coordinates and intensity, each Gaussian encodes parameters capturing geometry, orientation, opacity, and per-class
semantic distributions. By aggregating Gaussian point clouds across multiple frames, GaussianDet3D captures temporal
motion cues that enable precise velocity estimation without explicit tracking.
On the nuScenes benchmark, GaussianDet3D achieves state-of-the-art translation error and velocity error
among all camera-based methods, outperforming BEVFormer by 8.1% and 13.1% respectively, while remaining
competitive in overall detection score. These results demonstrate that Gaussian Splatting provides a geometrically
precise, semantically rich representation that bridges the gap between image-based perception and LiDAR-quality spatial
reasoning, particularly for the localization and motion estimation tasks most critical to autonomous driving safety.