Spatial AI & SLAM Engineer

Full Time

Palo Alto, CA, US

PROception

Design and deploy real-time spatial perception systems that allow humanoid and mobile robots to understand, navigate, and interact with complex 3D environments. You will work at the intersection of classical geometry, state estimation, and modern learning-based scene representations to build long-horizon world models that directly power manipulation and Vision-Language-Action (VLA) systems in the real world.

Requirements: MS or PhD in Robotics, Computer Vision, Computer Science, or a related fieldâ€”or equivalent industry experience
Strong background in SLAM, state estimation, and probabilistic sensor fusion
Deep understanding of 3D geometry, multi-view geometry, camera models, and calibration
Hands-on experience building perception systems for real robotic platforms
Experience with neural scene representations such as NeRFs, neural occupancy grids, or implicit SDFs
Proficiency in Python and C++ in Linux-based robotics environments
Experience working with large-scale datasets and long-running perception or learning experiments
Self-driven, systems-oriented, and excited about deploying perception on real robots
(+) Familiarity with Vision-Language-Action (VLA) or embodied AI systems
(+) Experience with tactile, force, or event-based sensors
(+) Background in Gaussian splatting, neural SDFs, or hybrid geometric-learning map representations
(+) Experience integrating perception outputs into manipulation or control pipelines
(+) Familiarity with Isaac Sim, MuJoCo, or photorealistic simulation environments

Competitive salary and meaningful equity
Comprehensive health, dental, and vision coverage
Work with world-class researchers and engineers in robotics and AI
High-ownership role with impact on core robot intelligence systems
Opportunity to define the future of real-world spatial perception and embodied AI

Design and deploy real-time SLAM and state-estimation systems for humanoid and mobile robots
Build multi-sensor fusion pipelines combining RGB-D, stereo, LiDAR, IMU, and tactile sensing
Develop neural scene representations including NeRFs, neural occupancy grids, and signed distance fields
Bridge classical geometry-based perception with learning-based spatial models
Enable long-horizon 3D world modeling for manipulation and interaction tasks
Integrate perception outputs into Vision-Language-Action (VLA) systems
Improve robustness under occlusion, motion blur, lighting variation, and sensor noise
Support sim-to-real transfer using synthetic data generation and domain randomization
Collaborate closely with robotics, controls, and learning teams to close the perception-action loop