Senior Machine Learning Engineer, On-Device & Mobile AI Optimization
Full Time
|
San Francisco, CA
|
Unity South APAC (SEA, ANZ, IND Subcont.)
Senior Machine Learning Engineer – On-Device & Mobile AI
Compensation: $188,200 – $282,200 USD base salary
Additional Compensation: Equity and participation in company incentive plans (where applicable)
The Opportunity
We are building the next generation of AI-driven game experiences by running generative models directly on players' devices—including phones, tablets, laptops, and desktops.
Our games run inside a modern browser-native runtime built on technologies such as WebGPU and WebNN, meaning the models powering these experiences must be deployed and accelerated entirely within that environment.
As a Senior Machine Learning Engineer – On-Device & Mobile AI, you will take state-of-the-art multimodal models—including transformers, diffusion networks, and vision-language models (VLMs)—and make them fast, efficient, and reliable on mobile and resource-constrained hardware.
This is a deeply hands-on engineering role. You will own significant portions of the inference stack, from a trained research checkpoint through export, quantization, kernel optimization, and production deployment at interactive frame rates within strict memory and power budgets.
Your work will directly determine the latency, quality, memory footprint, and battery consumption of AI features experienced by billions of players.
If you're energized by closing the gap between research models and shipping products—and enjoy profilers, frame captures, operator fusion, and squeezing every millisecond of performance—this role is for you.
What You'll Be Doing
Inference & On-Device Optimization
Own the optimization pipeline for production models, including:
Model export
Graph transformation
Operator fusion
Memory layout planning
Hardware-specific optimization across NPUs, mobile GPUs, and desktop GPUs
Apply optimization techniques including:
INT4, INT8, and FP16 quantization
Weight sharing
Structured and unstructured pruning
Knowledge distillation
Validate optimizations against strict latency, memory, power, and quality targets
Perform low-level performance optimization by:
Writing and tuning WebGPU compute shaders (WGSL)
Developing native compute kernels using Metal, Vulkan/SPIR-V, or CUDA where appropriate
Profile applications using tools such as:
Chrome/Dawn GPU Traces
PIX
Apple Instruments / Metal System Trace
Snapdragon Profiler
NVIDIA Nsight
RenderDoc
Eliminate bottlenecks at the operator and memory bandwidth level
Apply efficiency techniques including:
Dynamic resolution
Token reduction
Cross-frame caching and reuse
Reduced-step diffusion samplers
Runtime & Systems Integration
Work with browser-native inference runtimes, including:
ONNX Runtime Web
Transformers.js
WebLLM
TensorFlow.js
Integrate with native runtimes such as:
CoreML
ONNX Runtime
TensorFlow Lite
ExecuTorch
Extend or build custom integration layers where off-the-shelf solutions fall short
Build integrations between machine learning runtimes and the game engine, including:
Real-time scheduling
Memory pooling
Zero-copy buffer sharing between inference and rendering
Frame budget management
Develop supporting systems including:
Model packaging pipelines
Asset delivery
Device capability tiers
Crash and quality telemetry
Automated on-device benchmarking in CI
Research Productionization
Partner closely with research scientists to transform cutting-edge computer vision and multimodal models into production-ready implementations
Provide feedback to research teams regarding:
Hardware limitations
Operator support gaps
Performance cost models
Evaluate advances in:
Efficient attention
Knowledge distillation
Reduced-step diffusion
Focus on improvements that deliver measurable gains in latency, memory usage, and power efficiency
Collaboration & Engineering Quality
Contribute to engineering standards for:
Code review
Performance regression testing
On-device benchmarking
Track KPIs for:
Latency
Quality
Memory usage
Power consumption
Collaborate with platform engineers, product managers, and runtime teams to align engineering work with product requirements and device constraints
Mentor junior and mid-level engineers through code reviews, design discussions, and pair programming
What We're Looking For
5+ years of software engineering or machine learning engineering experience, including meaningful work on on-device inference, edge AI, or performance-critical systems
Production experience deploying transformer- and diffusion-based models—including Vision Transformers (ViT), Stable Diffusion, and CLIP/SigLIP-style encoders—on mobile, desktop, or embedded hardware
Hands-on experience with at least one major inference runtime, including:
ONNX Runtime
ONNX Runtime Web
CoreML
TensorFlow Lite
ExecuTorch
Strong understanding of:
Operator fusion
Memory layout optimization
Runtime scheduling
Low-level GPU performance engineering experience using one or more of:
WebGPU / WGSL
Metal
Vulkan
Direct3D 12
CUDA
Ability to analyze frame captures and kernel traces to diagnose performance bottlenecks
Practical experience applying:
Quantization
Weight sharing
Pruning
Knowledge distillation
Understanding of target hardware, including:
Apple Neural Engine
Qualcomm Hexagon and Adreno
ARM Mali
Apple Silicon
NVIDIA GPUs
AMD GPUs
Intel GPUs
Strong Python skills for model export pipelines and training-side tooling
Familiarity with TypeScript, JavaScript, and WGSL is a plus
Working fluency with the machine learning models you deploy, including the ability to understand architectures, adapt them for deployment, and balance performance with accuracy
Strong collaboration and communication skills
Nice to Have
Experience shipping:
World models
Neural rendering systems
Real-time generative AI
NeRF
3D Gaussian Splatting (3DGS)
Hands-on WebGPU deployment experience using:
ONNX Runtime Web (WebGPU Execution Provider)
Transformers.js
WebLLM
TensorFlow.js
Experience writing and optimizing WGSL compute shaders
Background in game engines such as Unity, Unreal Engine, or proprietary engines
Experience integrating compute workloads alongside rendering pipelines using:
Metal
Vulkan
Direct3D
OpenGL ES
Contributions to open-source inference runtimes, GPU libraries, or WebGPU tooling
Familiarity with compiler frameworks including:
MLIR
TVM
IREE
XLA
Experience building on-device benchmarking infrastructure and performance regression systems
Proficiency with C++, Objective-C, or Swift for runtime integration
Additional Information
Relocation assistance is not available
Visa sponsorship is not available
Benefits
Unity offers a comprehensive benefits package designed to support employee well-being and work-life balance. Benefits vary by country and employment status but may include:
Comprehensive health, life, and disability insurance
Commute subsidy
Employee stock ownership
Competitive retirement and pension plans
Generous vacation and personal leave
Family leave and caregiver support
Office food and snacks
Mental health and well-being programs
Employee Resource Groups (ERGs)
Global Employee Assistance Program
Learning and development opportunities
Volunteer and donation matching programs
Life at Unity
Unity (NYSE: U) is the world's leading game engine, powering experiences for more than 3 billion consumers each month.
The world's leading mobile games, top PC indie titles, innovative console games, XR experiences, and web experiences are built with Unity.
Beyond gaming, Unity enables organizations across industries—including automotive, manufacturing, and healthcare—to design, simulate, and collaborate in 3D, helping bridge the gap between ideas and reality.
Equal Employment Opportunity
Unity is proud to be an equal opportunity employer committed to fostering an inclusive and innovative workplace.
We celebrate diversity across age, race, color, ancestry, national origin, religion, disability, sex, gender identity or expression, sexual orientation, and every other protected characteristic recognized under applicable law.
Qualified applicants with arrest or conviction records will be considered in accordance with the San Francisco Fair Chance Ordinance.
Reasonable accommodations are available throughout the interview process for candidates with disabilities.
Additional Notes
Professional proficiency in English is required due to regular collaboration with global teams.
Unity does not accept unsolicited resumes from recruiters or staffing agencies without an existing signed agreement.
Please review Unity's Prospect Privacy Policy and Applicant Privacy Policy for additional information regarding candidate data and privacy.
Compensation
The anticipated base salary range for this position is:
$188,200 – $282,200 USD
In addition to base salary, this position may be eligible for:
Equity awards
Annual incentive plans, including discretionary bonuses or sales commissions
Final compensation will depend on geographic location, experience, professional background, and technical qualifications.