Spatial perception has witnessed an unprecedented progress in the last decade. Robots are now able to detect objects, localize them, and create large-scale maps of an unknown environment, which are crucial capabilities for navigation and manipulation. Despite these advances, both researchers and practitioners are well aware of the brittleness of current perception systems, and a large gap still separates robot and human perception. While many applications can afford occasional failures (e.g., AR/VR, domestic robotics) or can structure the environment to simplify perception (e.g., industrial robotics), safety-critical applications of robotics in the wild, ranging from self-driving vehicles to search & rescue, demand a new generation of algorithms. This talk discusses two efforts targeted at bridging this gap. The first focuses on robustness: I present recent advances in the design of certifiably robust spatial perception algorithms that are robust to extreme amounts of outliers and afford performance guarantees. These algorithms are “hard to break” and are able to work in regimes where all related techniques fail. The second effort targets metric-semantic understanding. While humans are able to quickly grasp both geometric and semantic aspects of a scene, high-level scene understanding remains a challenge for robotics. I present recent work on real-time metric-semantic understanding, which combines robust estimation with deep learning. I discuss these efforts and their applications to a variety of perception problems, including mesh registration, image-based object localization, and robot Simultaneous Localization and Mapping.