Spatial AI & SLAM Engineer

Full Time

|

Palo Alto, CA, US

|

PROception

Design and deploy real-time spatial perception systems that allow humanoid and mobile robots to understand, navigate, and interact with complex 3D environments. You will work at the intersection of classical geometry, state estimation, and modern learning-based scene representations to build long-horizon world models that directly power manipulation and Vision-Language-Action (VLA) systems in the real world.



  • Requirements: MS or PhD in Robotics, Computer Vision, Computer Science, or a related field—or equivalent industry experience
    Strong background in SLAM, state estimation, and probabilistic sensor fusion
    Deep understanding of 3D geometry, multi-view geometry, camera models, and calibration
    Hands-on experience building perception systems for real robotic platforms
    Experience with neural scene representations such as NeRFs, neural occupancy grids, or implicit SDFs
    Proficiency in Python and C++ in Linux-based robotics environments
    Experience working with large-scale datasets and long-running perception or learning experiments
    Self-driven, systems-oriented, and excited about deploying perception on real robots
    (+) Familiarity with Vision-Language-Action (VLA) or embodied AI systems
    (+) Experience with tactile, force, or event-based sensors
    (+) Background in Gaussian splatting, neural SDFs, or hybrid geometric-learning map representations
    (+) Experience integrating perception outputs into manipulation or control pipelines

  • (+) Familiarity with Isaac Sim, MuJoCo, or photorealistic simulation environments



  • Competitive salary and meaningful equity

  • Comprehensive health, dental, and vision coverage

  • Work with world-class researchers and engineers in robotics and AI

  • High-ownership role with impact on core robot intelligence systems

  • Opportunity to define the future of real-world spatial perception and embodied AI



  • Design and deploy real-time SLAM and state-estimation systems for humanoid and mobile robots

  • Build multi-sensor fusion pipelines combining RGB-D, stereo, LiDAR, IMU, and tactile sensing

  • Develop neural scene representations including NeRFs, neural occupancy grids, and signed distance fields

  • Bridge classical geometry-based perception with learning-based spatial models

  • Enable long-horizon 3D world modeling for manipulation and interaction tasks

  • Integrate perception outputs into Vision-Language-Action (VLA) systems

  • Improve robustness under occlusion, motion blur, lighting variation, and sensor noise

  • Support sim-to-real transfer using synthetic data generation and domain randomization

  • Collaborate closely with robotics, controls, and learning teams to close the perception-action loop