Senior Machine Learning Engineer, On-Device & Mobile AI Optimization

Full Time

San Francisco, CA

Unity South APAC (SEA, ANZ, IND Subcont.)

Senior Machine Learning Engineer – On-Device & Mobile AI

Compensation: $188,200 – $282,200 USD base salary
Additional Compensation: Equity and participation in company incentive plans (where applicable)

The Opportunity

We are building the next generation of AI-driven game experiences by running generative models directly on players' devices—including phones, tablets, laptops, and desktops.

Our games run inside a modern browser-native runtime built on technologies such as WebGPU and WebNN, meaning the models powering these experiences must be deployed and accelerated entirely within that environment.

As a Senior Machine Learning Engineer – On-Device & Mobile AI, you will take state-of-the-art multimodal models—including transformers, diffusion networks, and vision-language models (VLMs)—and make them fast, efficient, and reliable on mobile and resource-constrained hardware.

This is a deeply hands-on engineering role. You will own significant portions of the inference stack, from a trained research checkpoint through export, quantization, kernel optimization, and production deployment at interactive frame rates within strict memory and power budgets.

Your work will directly determine the latency, quality, memory footprint, and battery consumption of AI features experienced by billions of players.

If you're energized by closing the gap between research models and shipping products—and enjoy profilers, frame captures, operator fusion, and squeezing every millisecond of performance—this role is for you.

What You'll Be Doing

Inference & On-Device Optimization

Own the optimization pipeline for production models, including:
- Model export
- Graph transformation
- Operator fusion
- Memory layout planning
- Hardware-specific optimization across NPUs, mobile GPUs, and desktop GPUs
Apply optimization techniques including:
- INT4, INT8, and FP16 quantization
- Weight sharing
- Structured and unstructured pruning
- Knowledge distillation
Validate optimizations against strict latency, memory, power, and quality targets
Perform low-level performance optimization by:
- Writing and tuning WebGPU compute shaders (WGSL)
- Developing native compute kernels using Metal, Vulkan/SPIR-V, or CUDA where appropriate
Profile applications using tools such as:
- Chrome/Dawn GPU Traces
- PIX
- Apple Instruments / Metal System Trace
- Snapdragon Profiler
- NVIDIA Nsight
- RenderDoc
Eliminate bottlenecks at the operator and memory bandwidth level
Apply efficiency techniques including:
- Dynamic resolution
- Token reduction
- Cross-frame caching and reuse
- Reduced-step diffusion samplers

Runtime & Systems Integration

Work with browser-native inference runtimes, including:
- ONNX Runtime Web
- Transformers.js
- WebLLM
- TensorFlow.js
Integrate with native runtimes such as:
- CoreML
- ONNX Runtime
- TensorFlow Lite
- ExecuTorch
Extend or build custom integration layers where off-the-shelf solutions fall short
Build integrations between machine learning runtimes and the game engine, including:
- Real-time scheduling
- Memory pooling
- Zero-copy buffer sharing between inference and rendering
- Frame budget management
Develop supporting systems including:
- Model packaging pipelines
- Asset delivery
- Device capability tiers
- Crash and quality telemetry
- Automated on-device benchmarking in CI

Research Productionization

Partner closely with research scientists to transform cutting-edge computer vision and multimodal models into production-ready implementations
Provide feedback to research teams regarding:
- Hardware limitations
- Operator support gaps
- Performance cost models
Evaluate advances in:
- Efficient attention
- Knowledge distillation
- Reduced-step diffusion
Focus on improvements that deliver measurable gains in latency, memory usage, and power efficiency

Collaboration & Engineering Quality

Contribute to engineering standards for:
- Code review
- Performance regression testing
- On-device benchmarking
Track KPIs for:
- Latency
- Quality
- Memory usage
- Power consumption
Collaborate with platform engineers, product managers, and runtime teams to align engineering work with product requirements and device constraints
Mentor junior and mid-level engineers through code reviews, design discussions, and pair programming

What We're Looking For

5+ years of software engineering or machine learning engineering experience, including meaningful work on on-device inference, edge AI, or performance-critical systems
Production experience deploying transformer- and diffusion-based models—including Vision Transformers (ViT), Stable Diffusion, and CLIP/SigLIP-style encoders—on mobile, desktop, or embedded hardware
Hands-on experience with at least one major inference runtime, including:
- ONNX Runtime
- ONNX Runtime Web
- CoreML
- TensorFlow Lite
- ExecuTorch
Strong understanding of:
- Operator fusion
- Memory layout optimization
- Runtime scheduling
Low-level GPU performance engineering experience using one or more of:
- WebGPU / WGSL
- Metal
- Vulkan
- Direct3D 12
- CUDA
Ability to analyze frame captures and kernel traces to diagnose performance bottlenecks
Practical experience applying:
- Quantization
- Weight sharing
- Pruning
- Knowledge distillation
Understanding of target hardware, including:
- Apple Neural Engine
- Qualcomm Hexagon and Adreno
- ARM Mali
- Apple Silicon
- NVIDIA GPUs
- AMD GPUs
- Intel GPUs
Strong Python skills for model export pipelines and training-side tooling
Familiarity with TypeScript, JavaScript, and WGSL is a plus
Working fluency with the machine learning models you deploy, including the ability to understand architectures, adapt them for deployment, and balance performance with accuracy
Strong collaboration and communication skills

Nice to Have

Experience shipping:
- World models
- Neural rendering systems
- Real-time generative AI
- NeRF
- 3D Gaussian Splatting (3DGS)
Hands-on WebGPU deployment experience using:
- ONNX Runtime Web (WebGPU Execution Provider)
- Transformers.js
- WebLLM
- TensorFlow.js
Experience writing and optimizing WGSL compute shaders
Background in game engines such as Unity, Unreal Engine, or proprietary engines
Experience integrating compute workloads alongside rendering pipelines using:
- Metal
- Vulkan
- Direct3D
- OpenGL ES
Contributions to open-source inference runtimes, GPU libraries, or WebGPU tooling
Familiarity with compiler frameworks including:
- MLIR
- TVM
- IREE
- XLA
Experience building on-device benchmarking infrastructure and performance regression systems
Proficiency with C++, Objective-C, or Swift for runtime integration

Additional Information

Relocation assistance is not available
Visa sponsorship is not available

Benefits

Unity offers a comprehensive benefits package designed to support employee well-being and work-life balance. Benefits vary by country and employment status but may include:

Comprehensive health, life, and disability insurance
Commute subsidy
Employee stock ownership
Competitive retirement and pension plans
Generous vacation and personal leave
Family leave and caregiver support
Office food and snacks
Mental health and well-being programs
Employee Resource Groups (ERGs)
Global Employee Assistance Program
Learning and development opportunities
Volunteer and donation matching programs

Life at Unity

Unity (NYSE: U) is the world's leading game engine, powering experiences for more than 3 billion consumers each month.

The world's leading mobile games, top PC indie titles, innovative console games, XR experiences, and web experiences are built with Unity.

Beyond gaming, Unity enables organizations across industries—including automotive, manufacturing, and healthcare—to design, simulate, and collaborate in 3D, helping bridge the gap between ideas and reality.

Equal Employment Opportunity

Unity is proud to be an equal opportunity employer committed to fostering an inclusive and innovative workplace.

We celebrate diversity across age, race, color, ancestry, national origin, religion, disability, sex, gender identity or expression, sexual orientation, and every other protected characteristic recognized under applicable law.

Qualified applicants with arrest or conviction records will be considered in accordance with the San Francisco Fair Chance Ordinance.

Reasonable accommodations are available throughout the interview process for candidates with disabilities.

Additional Notes

Professional proficiency in English is required due to regular collaboration with global teams.
Unity does not accept unsolicited resumes from recruiters or staffing agencies without an existing signed agreement.
Please review Unity's Prospect Privacy Policy and Applicant Privacy Policy for additional information regarding candidate data and privacy.

Compensation

The anticipated base salary range for this position is:

$188,200 – $282,200 USD

In addition to base salary, this position may be eligible for:

Equity awards
Annual incentive plans, including discretionary bonuses or sales commissions

Final compensation will depend on geographic location, experience, professional background, and technical qualifications.

For engineers

Get matched with radiance field roles

Share your background and we’ll connect you with companies hiring for Gaussian splatting & NeRF — sometimes before roles are even posted.

We only share your details with companies you’d be a fit for.

Related Opportunities

Senior Manager, Interactive World Model Platforms

NVIDIA

Seattle, WA

Lead Technical Program Manager, Simulation

Waymo

Mountain View, CA, US

Senior Robotics Systems Engineer - Neural Reconstruction and Real2Sim Applications

NVIDIA

Santa Clara, CA, US

Entry Level 3D Scan Technician

Capgemini

Santa Clara, CA, US

View all open roles →