Principal Engineer, On-Device AI Inference & Systems

Full Time

|

Mountain View, CA

|

Unity

Principal Engineer – On-Device AI Inference & Systems

Compensation: $278,100 – $347,600 USD base salary
Additional Compensation: Equity and participation in company incentive plans (where applicable)

The Opportunity

We are building the next generation of AI-driven game experiences by running generative models directly on players' devices—including phones, tablets, laptops, and desktops.

Our games run inside a modern browser-native runtime built on technologies such as WebGPU and WebNN, meaning the models powering these experiences must be deployed and accelerated entirely within that environment.

As Principal Engineer – On-Device AI Inference & Systems, you will be the technical authority responsible for taking state-of-the-art multimodal models—including transformers and diffusion networks—and making them fast, efficient, and reliable within a production game engine.

This is a deeply hands-on, high-impact engineering leadership role. You will own the inference and integration stack end-to-end—from research checkpoint through optimization, kernel tuning, and production deployment at interactive frame rates within strict memory and power budgets.

You will establish engineering standards, define runtime architecture, mentor senior engineers, and directly shape the latency, quality, memory footprint, and battery efficiency of AI-powered gameplay experienced by millions of players worldwide.

If you're passionate about bridging cutting-edge research and shipping production AI systems—and enjoy profilers, frame captures, operator fusion, and squeezing every millisecond of performance—this role is for you.

What You'll Be Doing

Inference & On-Device Optimization

  • Own the complete optimization pipeline, including:

    • Model export

    • Graph transformation

    • Operator fusion

    • Memory layout planning

    • Hardware-specific kernel tuning across NPUs, mobile GPUs, and desktop GPUs

  • Make technical decisions regarding:

    • INT4, INT8, and FP16 quantization

    • Weight sharing

    • Structured and unstructured pruning

    • Knowledge distillation

  • Validate all optimizations against latency, memory, power, and quality requirements

  • Drive low-level GPU optimization by:

    • Writing and tuning WebGPU compute shaders (WGSL)

    • Developing native compute kernels using Metal, Vulkan/SPIR-V, Direct3D 12, and CUDA where appropriate

  • Profile and optimize using tools such as:

    • Chrome/Dawn GPU Traces

    • PIX

    • Apple Instruments / Metal System Trace

    • Snapdragon Profiler

    • NVIDIA Nsight

    • RenderDoc

  • Identify and eliminate operator-level and memory bandwidth bottlenecks

  • Apply efficiency techniques including:

    • Dynamic resolution

    • Token reduction

    • Cross-frame caching

    • Reduced-step diffusion sampling

Runtime & Systems Integration

  • Evaluate, select, and drive adoption of WebGPU inference runtimes, including:

    • ONNX Runtime Web

    • Transformers.js

    • WebLLM

    • TensorFlow.js

  • Work alongside native runtimes such as:

    • CoreML

    • ONNX Runtime

    • TensorFlow Lite

    • ExecuTorch

  • Extend or develop runtime and integration layers where existing solutions are insufficient

  • Design and own integration between the ML runtime and the game engine, including:

    • Real-time scheduling

    • Threading

    • Memory pooling

    • Zero-copy buffer sharing between inference and rendering

    • Frame budget management

  • Architect inference systems capable of processing:

    • Images

    • Text

    • 3D primitives

    • Metadata

  • Build systems that remain robust under:

    • Cold starts

    • Thermal throttling

    • Device fragmentation

    • Background execution

  • Develop supporting infrastructure including:

    • Model packaging

    • Asset pipelines

    • Device capability tiers

    • Crash and quality telemetry

    • Automated on-device benchmarking in CI

Research Productionization

  • Partner closely with research scientists to transform cutting-edge multimodal architectures into production-ready implementations

  • Provide feedback to research teams regarding:

    • Hardware constraints

    • Operator support limitations

    • Deployment cost models

  • Evaluate advances in:

    • Efficient attention

    • Knowledge distillation

    • Reduced-step diffusion

  • Focus engineering effort on techniques that deliver measurable improvements in latency, memory usage, and power efficiency

Engineering Leadership

  • Lead and mentor a team of engineers

  • Establish engineering best practices for:

    • Code review

    • Performance regression testing

    • On-device benchmarking

  • Define and enforce KPIs across:

    • Latency

    • Quality

    • Memory usage

    • Power consumption

  • Collaborate with platform engineering, product management, and runtime teams to align technical direction with product roadmaps and hardware constraints

What We're Looking For

  • 8+ years of software or machine learning engineering experience

  • At least 4 years focused on:

    • On-device AI

    • Edge inference

    • Real-time performance-critical systems

  • Proven experience shipping transformer- or diffusion-based models—including Vision Transformers (ViT) and Stable Diffusion—on mobile, desktop, or embedded hardware

  • Hands-on deployment experience using WebGPU runtimes such as:

    • ONNX Runtime Web (WebGPU Execution Provider)

    • Transformers.js

    • WebLLM

    • TensorFlow.js

  • Experience writing and optimizing WGSL compute shaders and working within WebGPU's adapter, device limits, and resource binding model

  • Equivalent expertise with native GPU APIs plus a demonstrated ability to transition to WebGPU is also valued

  • Deep expertise with at least one inference runtime, including:

    • ONNX Runtime

    • ONNX Runtime Web

    • CoreML

    • TensorFlow Lite

    • ExecuTorch

  • Strong understanding of:

    • Operator fusion

    • Memory layout optimization

    • Runtime scheduling

  • Low-level GPU performance engineering experience using:

    • WebGPU / WGSL

    • Metal

    • Vulkan

    • Direct3D 12

    • CUDA

  • Ability to analyze frame captures and kernel traces to identify performance bottlenecks

  • Practical experience applying:

    • Quantization

    • Weight sharing

    • Pruning

    • Knowledge distillation

  • Strong understanding of modern deployment hardware, including:

    • Apple Neural Engine

    • Qualcomm Hexagon and Adreno

    • ARM Mali

    • Apple Silicon

    • NVIDIA GPUs

    • AMD GPUs

    • Intel GPUs

  • Strong proficiency in:

    • TypeScript

    • JavaScript

    • WGSL

    • Python

  • Ability to understand, modify, and optimize modern machine learning architectures while balancing deployment tradeoffs

  • Demonstrated technical leadership, including setting engineering direction, influencing cross-functional teams, and mentoring engineers

Nice to Have

  • Experience deploying:

    • World models

    • Neural rendering systems

    • Real-time diffusion

    • NeRF

    • 3D Gaussian Splatting (3DGS)

  • Extensive experience with real-time graphics or game engines such as:

    • Unity

    • Unreal Engine

    • Proprietary engines

  • Experience integrating GPU compute workloads alongside rendering pipelines using:

    • Metal

    • Vulkan

    • Direct3D

    • OpenGL ES

  • Contributions to open-source inference runtimes, GPU libraries, or WebGPU tooling, including projects such as:

    • Dawn

    • wgpu

    • ONNX Runtime Web

    • Transformers.js

    • WebLLM

  • Familiarity with advanced WebGPU capabilities such as:

    • Subgroups

    • FP16 / shader-f16

    • Timestamp queries

  • Experience balancing browser runtime limitations with large diffusion workloads

  • Familiarity with compiler technologies including:

    • MLIR

    • TVM

    • IREE

    • XLA

  • Experience building large-scale device benchmarking infrastructure and performance regression systems

Additional Information

  • International relocation assistance is not available

  • Visa sponsorship is not available

  • This posting is intended to fill an existing vacancy, and applicants will receive updates throughout the hiring process in accordance with applicable law

Benefits

Unity offers a comprehensive benefits package designed to support employee well-being and work-life balance. Benefits vary by country and employment status but may include:

  • Comprehensive health, life, and disability insurance

  • Commute subsidy

  • Employee stock ownership

  • Competitive retirement and pension plans

  • Generous vacation and personal leave

  • Family leave and caregiver support

  • Office food and snacks

  • Mental health and well-being programs

  • Employee Resource Groups (ERGs)

  • Global Employee Assistance Program

  • Learning and development opportunities

  • Volunteer and donation matching programs

Life at Unity

Unity (NYSE: U) is the world's leading game engine, powering experiences for more than 3 billion consumers each month.

The world's leading mobile games, top PC indie titles, innovative console games, XR experiences, and web experiences are built with Unity.

Beyond gaming, Unity enables organizations across industries—including automotive, manufacturing, and healthcare—to design, simulate, and collaborate in 3D, helping bridge the gap between ideas and reality.

Equal Employment Opportunity

Unity is proud to be an equal opportunity employer committed to fostering an inclusive and innovative workplace.

We celebrate diversity across age, race, color, ancestry, national origin, religion, disability, sex, gender identity or expression, sexual orientation, and every other protected characteristic recognized under applicable law.

Reasonable accommodations are available throughout the interview process for candidates with disabilities.

Additional Notes

  • Professional proficiency in English is required due to regular collaboration with global teams.

  • Unity does not accept unsolicited resumes from recruiters or staffing agencies without an existing signed agreement.

  • Please review Unity's Prospect Privacy Policy and Applicant Privacy Policy for additional information regarding candidate data and privacy.

Compensation

The anticipated base salary range for this position is:

$278,100 – $347,600 USD

In addition to base salary, this role may be eligible for:

  • Equity awards

  • Annual incentive plans, including discretionary bonuses or sales commissions

Final compensation will depend on geographic location, experience, professional background, and technical qualifications.