Senior Machine Learning Platform Engineer
Shanghaionsitesenior
via Greenhouse
About this role
Key Responsibilities
Building the compute platform and machine learning libraries for large scale machine learning and simulation workloads
Focus on compute platform stability and efficiency on both CPU and GPU clusters, making the platform observable and scalable
Utilize cluster monitoring and profiling tools to identify bottlenecks and optimize both infrastructure and software system
Troubleshoot and resolve issues related to OS, storage, network, and GPUs
Challenges You Will Tackle: design, build and improve our compute platform for PB scale data model training and simulations with a wide range of machine learning models by leveraging our existing research infrastructure.
Requirements:
Solid experience in running production machine learning infrastructure at a large scale…
What we'd score you on
reqspace match rubricFive dimensions, recruiter-grade. Upload your resume and we'll generate a written explanation of where you fit and where the gaps are.
1
Skills match
We compare your skills against the role requirements.
2
Level fit
This role is senior-level. We check your trajectory against it.
3
Domain experience
Your work in the role's domain matters more than your years total. We weight recent and direct experience.
4
Recency
A skill you used last quarter weighs more than one from five years ago. We grade on recency, not lifetime.
5
Location fit
This role is based in Shanghai. We weight your proximity and willingness to relocate.
Score yourself on this role.
Free · no card · written explanation included
More at Optiverus
- View →Network EngineerSydney, Australia
- View →PhD Quant Focus (Singapore) 2026Sydney, Australia
- View →Head of Business Development (D1 Cash)Hong Kong, China
- View →Information Security Risk AnalystAmsterdam, North Holland, Netherlands
- View →Institutional TraderLondon, England, United Kingdom
- View →Institutional TraderAmsterdam, North Holland, Netherlands
