hero

COMPANIES YOU'LL LOVE TO WORK FOR

companies
Jobs

Computer Vision / Machine Learning Engineer (Video Generation)

Lex

Lex

Software Engineering, Data Science
Beijing, China
Posted on Mar 25, 2026
Summary

If you are passionate about advancing video generation, building state-of-the-art models that synthesize high-quality and controllable video, and optimizing them for on-device deployment, Apple is the right place for you. We are looking for engineers who combine deep technical expertise, creativity, and systems thinking to push the boundaries of video AI.

Description

As part of Apple’s Video Engineering org, you will develop models and infrastructure for video generation and understanding across Apple products. You will work on cutting-edge generative techniques, from diffusion and transformer-based models to frame interpolation and temporal modeling, while ensuring models run efficiently on iPhone, iPad, and Vision Pro. You will collaborate with research scientists, framework engineers, and cross-functional teams to design, train, optimize, and deploy scalable video generation systems.

Responsibilities

  • Design and develop generative video models for high-fidelity, controllable synthesis.
  • Build infrastructure for large-scale training, evaluation, and benchmarking of video models.
  • Investigate model consolidation and shared representation learning across video understanding and generation tasks.
  • Optimize algorithms for runtime, power, memory, and temporal quality on-device.
  • Collaborate with product and research teams to integrate video generation technologies into Apple’s camera and video pipelines.

Minimum Qualifications

  • M.S. or Ph.D. in Computer Science, Electrical Engineering, or related fields with focus on computer vision or machine learning.
  • Strong experience in one or more of: generative video modeling, video prediction, temporal modeling, or frame interpolation.
  • Proficiency in deep learning frameworks (PyTorch, JAX) and programming languages (Python, C++).
  • Experience with large-scale training pipelines and deploying models in real-world systems.
  • Strong written and verbal communication skills.

Preferred Qualifications

  • Publications in top-tier conferences (CVPR, ECCV, ICCV, NeurIPS, ICLR).
  • Experience with multi-modal video or text-video generation.
  • Familiarity with optimizing generative models for mobile/embedded devices.
  • Understanding of temporal consistency, controllable generation, and efficient infrastructure for large-scale video modeling.
  • Passion for building scalable, high-quality systems in cross-functional teams.

Apple is an equal opportunity employer that is committed to inclusion and diversity, and thus we treat all applicants fairly and equally. Apple is committed to working with and providing reasonable accommodation to applicants with physical and mental disabilities.