Abstract
Monocular 3D object detection plays a pivotal role in vehicle perception systems. Current methods frequently struggle to effectively extract scene-level semantic information, and the availability of monocular 3D detectors tailored to diverse embedded devices with varying computing power may still be limited. This paper introduces MonoYolo, a scalable detector designed for practicality and efficiency with varying resource constraints. In particular, we design a Superpixel Feature Pyramid Network (SFPN) that automatically groups pixels with similar attributes together. Experimental results on KITTI and nuScenes datasets showcase the advantageous performance of MonoYolo over superior monocular detectors for large models, while the lightweight model maintains real-time detection capabilities. Meanwhile, the proposed SFPN offers a seamless integration into existing image-only 3D detectors, presenting a plug-and-play solution for enhanced monocular 3D object detection performance.
| Original language | English |
|---|---|
| Article number | 113389 |
| Journal | Applied Soft Computing |
| Volume | 180 |
| DOIs | |
| State | Published - Aug 2025 |
Keywords
- Monocular 3D object detection
- Scalable detector
- Superpixel
- Vehicle perception system