Senior Machine Learning Engineer, On-Device & Mobile AI Optimization

Full Time

|

San Francisco, CA

|

Unity South APAC (SEA, ANZ, IND Subcont.)

Senior Machine Learning Engineer – On-Device & Mobile AI

Compensation: $188,200 – $282,200 USD base salary
Additional Compensation: Equity and participation in company incentive plans (where applicable)

The Opportunity

We are building the next generation of AI-driven game experiences by running generative models directly on players' devices—including phones, tablets, laptops, and desktops.

Our games run inside a modern browser-native runtime built on technologies such as WebGPU and WebNN, meaning the models powering these experiences must be deployed and accelerated entirely within that environment.

As a Senior Machine Learning Engineer – On-Device & Mobile AI, you will take state-of-the-art multimodal models—including transformers, diffusion networks, and vision-language models (VLMs)—and make them fast, efficient, and reliable on mobile and resource-constrained hardware.

This is a deeply hands-on engineering role. You will own significant portions of the inference stack, from a trained research checkpoint through export, quantization, kernel optimization, and production deployment at interactive frame rates within strict memory and power budgets.

Your work will directly determine the latency, quality, memory footprint, and battery consumption of AI features experienced by billions of players.

If you're energized by closing the gap between research models and shipping products—and enjoy profilers, frame captures, operator fusion, and squeezing every millisecond of performance—this role is for you.

What You'll Be Doing

Inference & On-Device Optimization

  • Own the optimization pipeline for production models, including:

    • Model export

    • Graph transformation

    • Operator fusion

    • Memory layout planning

    • Hardware-specific optimization across NPUs, mobile GPUs, and desktop GPUs

  • Apply optimization techniques including:

    • INT4, INT8, and FP16 quantization

    • Weight sharing

    • Structured and unstructured pruning

    • Knowledge distillation

  • Validate optimizations against strict latency, memory, power, and quality targets

  • Perform low-level performance optimization by:

    • Writing and tuning WebGPU compute shaders (WGSL)

    • Developing native compute kernels using Metal, Vulkan/SPIR-V, or CUDA where appropriate

  • Profile applications using tools such as:

    • Chrome/Dawn GPU Traces

    • PIX

    • Apple Instruments / Metal System Trace

    • Snapdragon Profiler

    • NVIDIA Nsight

    • RenderDoc

  • Eliminate bottlenecks at the operator and memory bandwidth level

  • Apply efficiency techniques including:

    • Dynamic resolution

    • Token reduction

    • Cross-frame caching and reuse

    • Reduced-step diffusion samplers

Runtime & Systems Integration

  • Work with browser-native inference runtimes, including:

    • ONNX Runtime Web

    • Transformers.js

    • WebLLM

    • TensorFlow.js

  • Integrate with native runtimes such as:

    • CoreML

    • ONNX Runtime

    • TensorFlow Lite

    • ExecuTorch

  • Extend or build custom integration layers where off-the-shelf solutions fall short

  • Build integrations between machine learning runtimes and the game engine, including:

    • Real-time scheduling

    • Memory pooling

    • Zero-copy buffer sharing between inference and rendering

    • Frame budget management

  • Develop supporting systems including:

    • Model packaging pipelines

    • Asset delivery

    • Device capability tiers

    • Crash and quality telemetry

    • Automated on-device benchmarking in CI

Research Productionization

  • Partner closely with research scientists to transform cutting-edge computer vision and multimodal models into production-ready implementations

  • Provide feedback to research teams regarding:

    • Hardware limitations

    • Operator support gaps

    • Performance cost models

  • Evaluate advances in:

    • Efficient attention

    • Knowledge distillation

    • Reduced-step diffusion

  • Focus on improvements that deliver measurable gains in latency, memory usage, and power efficiency

Collaboration & Engineering Quality

  • Contribute to engineering standards for:

    • Code review

    • Performance regression testing

    • On-device benchmarking

  • Track KPIs for:

    • Latency

    • Quality

    • Memory usage

    • Power consumption

  • Collaborate with platform engineers, product managers, and runtime teams to align engineering work with product requirements and device constraints

  • Mentor junior and mid-level engineers through code reviews, design discussions, and pair programming

What We're Looking For

  • 5+ years of software engineering or machine learning engineering experience, including meaningful work on on-device inference, edge AI, or performance-critical systems

  • Production experience deploying transformer- and diffusion-based models—including Vision Transformers (ViT), Stable Diffusion, and CLIP/SigLIP-style encoders—on mobile, desktop, or embedded hardware

  • Hands-on experience with at least one major inference runtime, including:

    • ONNX Runtime

    • ONNX Runtime Web

    • CoreML

    • TensorFlow Lite

    • ExecuTorch

  • Strong understanding of:

    • Operator fusion

    • Memory layout optimization

    • Runtime scheduling

  • Low-level GPU performance engineering experience using one or more of:

    • WebGPU / WGSL

    • Metal

    • Vulkan

    • Direct3D 12

    • CUDA

  • Ability to analyze frame captures and kernel traces to diagnose performance bottlenecks

  • Practical experience applying:

    • Quantization

    • Weight sharing

    • Pruning

    • Knowledge distillation

  • Understanding of target hardware, including:

    • Apple Neural Engine

    • Qualcomm Hexagon and Adreno

    • ARM Mali

    • Apple Silicon

    • NVIDIA GPUs

    • AMD GPUs

    • Intel GPUs

  • Strong Python skills for model export pipelines and training-side tooling

  • Familiarity with TypeScript, JavaScript, and WGSL is a plus

  • Working fluency with the machine learning models you deploy, including the ability to understand architectures, adapt them for deployment, and balance performance with accuracy

  • Strong collaboration and communication skills

Nice to Have

  • Experience shipping:

    • World models

    • Neural rendering systems

    • Real-time generative AI

    • NeRF

    • 3D Gaussian Splatting (3DGS)

  • Hands-on WebGPU deployment experience using:

    • ONNX Runtime Web (WebGPU Execution Provider)

    • Transformers.js

    • WebLLM

    • TensorFlow.js

  • Experience writing and optimizing WGSL compute shaders

  • Background in game engines such as Unity, Unreal Engine, or proprietary engines

  • Experience integrating compute workloads alongside rendering pipelines using:

    • Metal

    • Vulkan

    • Direct3D

    • OpenGL ES

  • Contributions to open-source inference runtimes, GPU libraries, or WebGPU tooling

  • Familiarity with compiler frameworks including:

    • MLIR

    • TVM

    • IREE

    • XLA

  • Experience building on-device benchmarking infrastructure and performance regression systems

  • Proficiency with C++, Objective-C, or Swift for runtime integration

Additional Information

  • Relocation assistance is not available

  • Visa sponsorship is not available

Benefits

Unity offers a comprehensive benefits package designed to support employee well-being and work-life balance. Benefits vary by country and employment status but may include:

  • Comprehensive health, life, and disability insurance

  • Commute subsidy

  • Employee stock ownership

  • Competitive retirement and pension plans

  • Generous vacation and personal leave

  • Family leave and caregiver support

  • Office food and snacks

  • Mental health and well-being programs

  • Employee Resource Groups (ERGs)

  • Global Employee Assistance Program

  • Learning and development opportunities

  • Volunteer and donation matching programs

Life at Unity

Unity (NYSE: U) is the world's leading game engine, powering experiences for more than 3 billion consumers each month.

The world's leading mobile games, top PC indie titles, innovative console games, XR experiences, and web experiences are built with Unity.

Beyond gaming, Unity enables organizations across industries—including automotive, manufacturing, and healthcare—to design, simulate, and collaborate in 3D, helping bridge the gap between ideas and reality.

Equal Employment Opportunity

Unity is proud to be an equal opportunity employer committed to fostering an inclusive and innovative workplace.

We celebrate diversity across age, race, color, ancestry, national origin, religion, disability, sex, gender identity or expression, sexual orientation, and every other protected characteristic recognized under applicable law.

Qualified applicants with arrest or conviction records will be considered in accordance with the San Francisco Fair Chance Ordinance.

Reasonable accommodations are available throughout the interview process for candidates with disabilities.

Additional Notes

  • Professional proficiency in English is required due to regular collaboration with global teams.

  • Unity does not accept unsolicited resumes from recruiters or staffing agencies without an existing signed agreement.

  • Please review Unity's Prospect Privacy Policy and Applicant Privacy Policy for additional information regarding candidate data and privacy.

Compensation

The anticipated base salary range for this position is:

$188,200 – $282,200 USD

In addition to base salary, this position may be eligible for:

  • Equity awards

  • Annual incentive plans, including discretionary bonuses or sales commissions

Final compensation will depend on geographic location, experience, professional background, and technical qualifications.