Machine Learning Infrastructure Engineer

remotemid

via Ashby

About this role

ABOUT THE ROLE We’re looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research. Responsibilities: - Provide infrastructure support to our ML research and product - Build tooling to diagnose cluster issues and hardware failures - Monitor deployments, manage experiments, and generally support our research - Maximize GPU allocation and utilization for both serving and training Requirements: - 4+ years of experience supporting the infrastructure within an ML environment - Experience in developing tools used to diagnose ML infrastructure problems and failures - Experience with cloud platforms (e.g., Compute Engine, Kubernetes, Cloud Storage) - Experience working with GPUs…

Read the full description on Character's site →

What we'd score you on

reqspace match rubric

Five dimensions, recruiter-grade. Upload your resume and we'll generate a written explanation of where you fit and where the gaps are.

1

Skills match

For this role: kubernetes, jax

2

Level fit

This role is mid-level. We check your trajectory against it.

3

Domain experience

Your work in the role's domain matters more than your years total. We weight recent and direct experience.

4

Recency

A skill you used last quarter weighs more than one from five years ago. We grade on recency, not lifetime.

5

Location fit

This role is remote-eligible — we factor in your stated location and time-zone overlap.

Score yourself on this role.
Free · no card · written explanation included
See if I'm a fit →

Skills in this role

Pulled from the job description. These are the keywords we'll weight when scoring your fit.

kubernetesjax

More at Character

See all open jobs at Character