Principal Engineer, On-Device AI Inference & Systems

Full Time

Mountain View, CA

Unity

Principal Engineer – On-Device AI Inference & Systems

Compensation: $278,100 – $347,600 USD base salary
Additional Compensation: Equity and participation in company incentive plans (where applicable)

The Opportunity

We are building the next generation of AI-driven game experiences by running generative models directly on players' devices—including phones, tablets, laptops, and desktops.

Our games run inside a modern browser-native runtime built on technologies such as WebGPU and WebNN, meaning the models powering these experiences must be deployed and accelerated entirely within that environment.

As Principal Engineer – On-Device AI Inference & Systems, you will be the technical authority responsible for taking state-of-the-art multimodal models—including transformers and diffusion networks—and making them fast, efficient, and reliable within a production game engine.

This is a deeply hands-on, high-impact engineering leadership role. You will own the inference and integration stack end-to-end—from research checkpoint through optimization, kernel tuning, and production deployment at interactive frame rates within strict memory and power budgets.

You will establish engineering standards, define runtime architecture, mentor senior engineers, and directly shape the latency, quality, memory footprint, and battery efficiency of AI-powered gameplay experienced by millions of players worldwide.

If you're passionate about bridging cutting-edge research and shipping production AI systems—and enjoy profilers, frame captures, operator fusion, and squeezing every millisecond of performance—this role is for you.

What You'll Be Doing

Inference & On-Device Optimization

Own the complete optimization pipeline, including:
- Model export
- Graph transformation
- Operator fusion
- Memory layout planning
- Hardware-specific kernel tuning across NPUs, mobile GPUs, and desktop GPUs
Make technical decisions regarding:
- INT4, INT8, and FP16 quantization
- Weight sharing
- Structured and unstructured pruning
- Knowledge distillation
Validate all optimizations against latency, memory, power, and quality requirements
Drive low-level GPU optimization by:
- Writing and tuning WebGPU compute shaders (WGSL)
- Developing native compute kernels using Metal, Vulkan/SPIR-V, Direct3D 12, and CUDA where appropriate
Profile and optimize using tools such as:
- Chrome/Dawn GPU Traces
- PIX
- Apple Instruments / Metal System Trace
- Snapdragon Profiler
- NVIDIA Nsight
- RenderDoc
Identify and eliminate operator-level and memory bandwidth bottlenecks
Apply efficiency techniques including:
- Dynamic resolution
- Token reduction
- Cross-frame caching
- Reduced-step diffusion sampling

Runtime & Systems Integration

Evaluate, select, and drive adoption of WebGPU inference runtimes, including:
- ONNX Runtime Web
- Transformers.js
- WebLLM
- TensorFlow.js
Work alongside native runtimes such as:
- CoreML
- ONNX Runtime
- TensorFlow Lite
- ExecuTorch
Extend or develop runtime and integration layers where existing solutions are insufficient
Design and own integration between the ML runtime and the game engine, including:
- Real-time scheduling
- Threading
- Memory pooling
- Zero-copy buffer sharing between inference and rendering
- Frame budget management
Architect inference systems capable of processing:
- Images
- Text
- 3D primitives
- Metadata
Build systems that remain robust under:
- Cold starts
- Thermal throttling
- Device fragmentation
- Background execution
Develop supporting infrastructure including:
- Model packaging
- Asset pipelines
- Device capability tiers
- Crash and quality telemetry
- Automated on-device benchmarking in CI

Research Productionization

Partner closely with research scientists to transform cutting-edge multimodal architectures into production-ready implementations
Provide feedback to research teams regarding:
- Hardware constraints
- Operator support limitations
- Deployment cost models
Evaluate advances in:
- Efficient attention
- Knowledge distillation
- Reduced-step diffusion
Focus engineering effort on techniques that deliver measurable improvements in latency, memory usage, and power efficiency

Engineering Leadership

Lead and mentor a team of engineers
Establish engineering best practices for:
- Code review
- Performance regression testing
- On-device benchmarking
Define and enforce KPIs across:
- Latency
- Quality
- Memory usage
- Power consumption
Collaborate with platform engineering, product management, and runtime teams to align technical direction with product roadmaps and hardware constraints

What We're Looking For

8+ years of software or machine learning engineering experience
At least 4 years focused on:
- On-device AI
- Edge inference
- Real-time performance-critical systems
Proven experience shipping transformer- or diffusion-based models—including Vision Transformers (ViT) and Stable Diffusion—on mobile, desktop, or embedded hardware
Hands-on deployment experience using WebGPU runtimes such as:
- ONNX Runtime Web (WebGPU Execution Provider)
- Transformers.js
- WebLLM
- TensorFlow.js
Experience writing and optimizing WGSL compute shaders and working within WebGPU's adapter, device limits, and resource binding model
Equivalent expertise with native GPU APIs plus a demonstrated ability to transition to WebGPU is also valued
Deep expertise with at least one inference runtime, including:
- ONNX Runtime
- ONNX Runtime Web
- CoreML
- TensorFlow Lite
- ExecuTorch
Strong understanding of:
- Operator fusion
- Memory layout optimization
- Runtime scheduling
Low-level GPU performance engineering experience using:
- WebGPU / WGSL
- Metal
- Vulkan
- Direct3D 12
- CUDA
Ability to analyze frame captures and kernel traces to identify performance bottlenecks
Practical experience applying:
- Quantization
- Weight sharing
- Pruning
- Knowledge distillation
Strong understanding of modern deployment hardware, including:
- Apple Neural Engine
- Qualcomm Hexagon and Adreno
- ARM Mali
- Apple Silicon
- NVIDIA GPUs
- AMD GPUs
- Intel GPUs
Strong proficiency in:
- TypeScript
- JavaScript
- WGSL
- Python
Ability to understand, modify, and optimize modern machine learning architectures while balancing deployment tradeoffs
Demonstrated technical leadership, including setting engineering direction, influencing cross-functional teams, and mentoring engineers

Nice to Have

Experience deploying:
- World models
- Neural rendering systems
- Real-time diffusion
- NeRF
- 3D Gaussian Splatting (3DGS)
Extensive experience with real-time graphics or game engines such as:
- Unity
- Unreal Engine
- Proprietary engines
Experience integrating GPU compute workloads alongside rendering pipelines using:
- Metal
- Vulkan
- Direct3D
- OpenGL ES
Contributions to open-source inference runtimes, GPU libraries, or WebGPU tooling, including projects such as:
- Dawn
- wgpu
- ONNX Runtime Web
- Transformers.js
- WebLLM
Familiarity with advanced WebGPU capabilities such as:
- Subgroups
- FP16 / shader-f16
- Timestamp queries
Experience balancing browser runtime limitations with large diffusion workloads
Familiarity with compiler technologies including:
- MLIR
- TVM
- IREE
- XLA
Experience building large-scale device benchmarking infrastructure and performance regression systems

Additional Information

International relocation assistance is not available
Visa sponsorship is not available
This posting is intended to fill an existing vacancy, and applicants will receive updates throughout the hiring process in accordance with applicable law

Benefits

Unity offers a comprehensive benefits package designed to support employee well-being and work-life balance. Benefits vary by country and employment status but may include:

Comprehensive health, life, and disability insurance
Commute subsidy
Employee stock ownership
Competitive retirement and pension plans
Generous vacation and personal leave
Family leave and caregiver support
Office food and snacks
Mental health and well-being programs
Employee Resource Groups (ERGs)
Global Employee Assistance Program
Learning and development opportunities
Volunteer and donation matching programs

Life at Unity

Unity (NYSE: U) is the world's leading game engine, powering experiences for more than 3 billion consumers each month.

The world's leading mobile games, top PC indie titles, innovative console games, XR experiences, and web experiences are built with Unity.

Beyond gaming, Unity enables organizations across industries—including automotive, manufacturing, and healthcare—to design, simulate, and collaborate in 3D, helping bridge the gap between ideas and reality.

Equal Employment Opportunity

Unity is proud to be an equal opportunity employer committed to fostering an inclusive and innovative workplace.

We celebrate diversity across age, race, color, ancestry, national origin, religion, disability, sex, gender identity or expression, sexual orientation, and every other protected characteristic recognized under applicable law.

Reasonable accommodations are available throughout the interview process for candidates with disabilities.

Additional Notes

Professional proficiency in English is required due to regular collaboration with global teams.
Unity does not accept unsolicited resumes from recruiters or staffing agencies without an existing signed agreement.
Please review Unity's Prospect Privacy Policy and Applicant Privacy Policy for additional information regarding candidate data and privacy.

Compensation

The anticipated base salary range for this position is:

$278,100 – $347,600 USD

In addition to base salary, this role may be eligible for:

Equity awards
Annual incentive plans, including discretionary bonuses or sales commissions

Final compensation will depend on geographic location, experience, professional background, and technical qualifications.

For engineers

Get matched with radiance field roles

Share your background and we’ll connect you with companies hiring for Gaussian splatting & NeRF — sometimes before roles are even posted.

We only share your details with companies you’d be a fit for.

Related Opportunities

Senior Manager, Interactive World Model Platforms

NVIDIA

Seattle, WA

Lead Technical Program Manager, Simulation

Waymo

Mountain View, CA, US

Senior Robotics Systems Engineer - Neural Reconstruction and Real2Sim Applications

NVIDIA

Santa Clara, CA, US

Entry Level 3D Scan Technician

Capgemini

Santa Clara, CA, US

View all open roles →