Senior Machine Learning Platform Engineer

Shanghaionsitesenior

via Greenhouse

See if I'm a fit →Tailor my resume for this role →Apply on Greenhouse ↗

About this role

Key Responsibilities Building the compute platform and machine learning libraries for large scale machine learning and simulation workloads Focus on compute platform stability and efficiency on both CPU and GPU clusters, making the platform observable and scalable Utilize cluster monitoring and profiling tools to identify bottlenecks and optimize both infrastructure and software system Troubleshoot and resolve issues related to OS, storage, network, and GPUs Challenges You Will Tackle: design, build and improve our compute platform for PB scale data model training and simulations with a wide range of machine learning models by leveraging our existing research infrastructure. Requirements: Solid experience in running production machine learning infrastructure at a large scale…

Read the full description on Optiverus's site →

What we'd score you on

reqspace match rubric

Five dimensions, recruiter-grade. Upload your resume and we'll generate a written explanation of where you fit and where the gaps are.

Skills match

We compare your skills against the role requirements.

Level fit

This role is senior-level. We check your trajectory against it.

Domain experience

Your work in the role's domain matters more than your years total. We weight recent and direct experience.

Recency

A skill you used last quarter weighs more than one from five years ago. We grade on recency, not lifetime.

Location fit

This role is based in Shanghai. We weight your proximity and willingness to relocate.

Score yourself on this role.

Free · no card · written explanation included

See if I'm a fit →

More at Optiverus

See all open jobs at Optiverus →