Principal Engineer, On-Device AI Inference & Systems
Full Time
|
Mountain View, CA
|
Unity
Principal Engineer – On-Device AI Inference & Systems
Compensation: $278,100 – $347,600 USD base salary
Additional Compensation: Equity and participation in company incentive plans (where applicable)
The Opportunity
We are building the next generation of AI-driven game experiences by running generative models directly on players' devices—including phones, tablets, laptops, and desktops.
Our games run inside a modern browser-native runtime built on technologies such as WebGPU and WebNN, meaning the models powering these experiences must be deployed and accelerated entirely within that environment.
As Principal Engineer – On-Device AI Inference & Systems, you will be the technical authority responsible for taking state-of-the-art multimodal models—including transformers and diffusion networks—and making them fast, efficient, and reliable within a production game engine.
This is a deeply hands-on, high-impact engineering leadership role. You will own the inference and integration stack end-to-end—from research checkpoint through optimization, kernel tuning, and production deployment at interactive frame rates within strict memory and power budgets.
You will establish engineering standards, define runtime architecture, mentor senior engineers, and directly shape the latency, quality, memory footprint, and battery efficiency of AI-powered gameplay experienced by millions of players worldwide.
If you're passionate about bridging cutting-edge research and shipping production AI systems—and enjoy profilers, frame captures, operator fusion, and squeezing every millisecond of performance—this role is for you.
What You'll Be Doing
Inference & On-Device Optimization
Own the complete optimization pipeline, including:
Model export
Graph transformation
Operator fusion
Memory layout planning
Hardware-specific kernel tuning across NPUs, mobile GPUs, and desktop GPUs
Make technical decisions regarding:
INT4, INT8, and FP16 quantization
Weight sharing
Structured and unstructured pruning
Knowledge distillation
Validate all optimizations against latency, memory, power, and quality requirements
Drive low-level GPU optimization by:
Writing and tuning WebGPU compute shaders (WGSL)
Developing native compute kernels using Metal, Vulkan/SPIR-V, Direct3D 12, and CUDA where appropriate
Profile and optimize using tools such as:
Chrome/Dawn GPU Traces
PIX
Apple Instruments / Metal System Trace
Snapdragon Profiler
NVIDIA Nsight
RenderDoc
Identify and eliminate operator-level and memory bandwidth bottlenecks
Apply efficiency techniques including:
Dynamic resolution
Token reduction
Cross-frame caching
Reduced-step diffusion sampling
Runtime & Systems Integration
Evaluate, select, and drive adoption of WebGPU inference runtimes, including:
ONNX Runtime Web
Transformers.js
WebLLM
TensorFlow.js
Work alongside native runtimes such as:
CoreML
ONNX Runtime
TensorFlow Lite
ExecuTorch
Extend or develop runtime and integration layers where existing solutions are insufficient
Design and own integration between the ML runtime and the game engine, including:
Real-time scheduling
Threading
Memory pooling
Zero-copy buffer sharing between inference and rendering
Frame budget management
Architect inference systems capable of processing:
Images
Text
3D primitives
Metadata
Build systems that remain robust under:
Cold starts
Thermal throttling
Device fragmentation
Background execution
Develop supporting infrastructure including:
Model packaging
Asset pipelines
Device capability tiers
Crash and quality telemetry
Automated on-device benchmarking in CI
Research Productionization
Partner closely with research scientists to transform cutting-edge multimodal architectures into production-ready implementations
Provide feedback to research teams regarding:
Hardware constraints
Operator support limitations
Deployment cost models
Evaluate advances in:
Efficient attention
Knowledge distillation
Reduced-step diffusion
Focus engineering effort on techniques that deliver measurable improvements in latency, memory usage, and power efficiency
Engineering Leadership
Lead and mentor a team of engineers
Establish engineering best practices for:
Code review
Performance regression testing
On-device benchmarking
Define and enforce KPIs across:
Latency
Quality
Memory usage
Power consumption
Collaborate with platform engineering, product management, and runtime teams to align technical direction with product roadmaps and hardware constraints
What We're Looking For
8+ years of software or machine learning engineering experience
At least 4 years focused on:
On-device AI
Edge inference
Real-time performance-critical systems
Proven experience shipping transformer- or diffusion-based models—including Vision Transformers (ViT) and Stable Diffusion—on mobile, desktop, or embedded hardware
Hands-on deployment experience using WebGPU runtimes such as:
ONNX Runtime Web (WebGPU Execution Provider)
Transformers.js
WebLLM
TensorFlow.js
Experience writing and optimizing WGSL compute shaders and working within WebGPU's adapter, device limits, and resource binding model
Equivalent expertise with native GPU APIs plus a demonstrated ability to transition to WebGPU is also valued
Deep expertise with at least one inference runtime, including:
ONNX Runtime
ONNX Runtime Web
CoreML
TensorFlow Lite
ExecuTorch
Strong understanding of:
Operator fusion
Memory layout optimization
Runtime scheduling
Low-level GPU performance engineering experience using:
WebGPU / WGSL
Metal
Vulkan
Direct3D 12
CUDA
Ability to analyze frame captures and kernel traces to identify performance bottlenecks
Practical experience applying:
Quantization
Weight sharing
Pruning
Knowledge distillation
Strong understanding of modern deployment hardware, including:
Apple Neural Engine
Qualcomm Hexagon and Adreno
ARM Mali
Apple Silicon
NVIDIA GPUs
AMD GPUs
Intel GPUs
Strong proficiency in:
TypeScript
JavaScript
WGSL
Python
Ability to understand, modify, and optimize modern machine learning architectures while balancing deployment tradeoffs
Demonstrated technical leadership, including setting engineering direction, influencing cross-functional teams, and mentoring engineers
Nice to Have
Experience deploying:
World models
Neural rendering systems
Real-time diffusion
NeRF
3D Gaussian Splatting (3DGS)
Extensive experience with real-time graphics or game engines such as:
Unity
Unreal Engine
Proprietary engines
Experience integrating GPU compute workloads alongside rendering pipelines using:
Metal
Vulkan
Direct3D
OpenGL ES
Contributions to open-source inference runtimes, GPU libraries, or WebGPU tooling, including projects such as:
Dawn
wgpu
ONNX Runtime Web
Transformers.js
WebLLM
Familiarity with advanced WebGPU capabilities such as:
Subgroups
FP16 / shader-f16
Timestamp queries
Experience balancing browser runtime limitations with large diffusion workloads
Familiarity with compiler technologies including:
MLIR
TVM
IREE
XLA
Experience building large-scale device benchmarking infrastructure and performance regression systems
Additional Information
International relocation assistance is not available
Visa sponsorship is not available
This posting is intended to fill an existing vacancy, and applicants will receive updates throughout the hiring process in accordance with applicable law
Benefits
Unity offers a comprehensive benefits package designed to support employee well-being and work-life balance. Benefits vary by country and employment status but may include:
Comprehensive health, life, and disability insurance
Commute subsidy
Employee stock ownership
Competitive retirement and pension plans
Generous vacation and personal leave
Family leave and caregiver support
Office food and snacks
Mental health and well-being programs
Employee Resource Groups (ERGs)
Global Employee Assistance Program
Learning and development opportunities
Volunteer and donation matching programs
Life at Unity
Unity (NYSE: U) is the world's leading game engine, powering experiences for more than 3 billion consumers each month.
The world's leading mobile games, top PC indie titles, innovative console games, XR experiences, and web experiences are built with Unity.
Beyond gaming, Unity enables organizations across industries—including automotive, manufacturing, and healthcare—to design, simulate, and collaborate in 3D, helping bridge the gap between ideas and reality.
Equal Employment Opportunity
Unity is proud to be an equal opportunity employer committed to fostering an inclusive and innovative workplace.
We celebrate diversity across age, race, color, ancestry, national origin, religion, disability, sex, gender identity or expression, sexual orientation, and every other protected characteristic recognized under applicable law.
Reasonable accommodations are available throughout the interview process for candidates with disabilities.
Additional Notes
Professional proficiency in English is required due to regular collaboration with global teams.
Unity does not accept unsolicited resumes from recruiters or staffing agencies without an existing signed agreement.
Please review Unity's Prospect Privacy Policy and Applicant Privacy Policy for additional information regarding candidate data and privacy.
Compensation
The anticipated base salary range for this position is:
$278,100 – $347,600 USD
In addition to base salary, this role may be eligible for:
Equity awards
Annual incentive plans, including discretionary bonuses or sales commissions
Final compensation will depend on geographic location, experience, professional background, and technical qualifications.