Member of Technical Staff, AI Training Infrastructure

via Ashby

About this role

THE ROLE:  As a Training Infrastructure Engineer, you'll design, build, and optimize the infrastructure that powers our large-scale model training operations. Your work will be essential to developing high-performance AI training infrastructure. You'll collaborate with AI researchers and engineers to create robust training pipelines, optimize distributed training workloads, and ensure reliable model development. KEY RESPONSIBILITIES: - Design and implement scalable infrastructure for large-scale model training workloads - Develop and maintain distributed training pipelines for LLMs and multimodal models - Optimize training performance across multiple GPUs, nodes, and data centers - Implement monitoring, logging, and debugging tools for training operations…

Read the full description on Fireworksai's site →

What we'd score you on

reqspace match rubric

Five dimensions, recruiter-grade. Upload your resume and we'll generate a written explanation of where you fit and where the gaps are.

1

Skills match

For this role: aws, azure, gcp, kubernetes, docker…

2

Level fit

We check your title trajectory against the seniority signal of the role.

3

Domain experience

Your work in the role's domain matters more than your years total. We weight recent and direct experience.

4

Recency

A skill you used last quarter weighs more than one from five years ago. We grade on recency, not lifetime.

5

Location fit

This role is based in a specific location. We weight your proximity and willingness to relocate.

Score yourself on this role.
Free · no card · written explanation included
See if I'm a fit →

Skills in this role

Pulled from the job description. These are the keywords we'll weight when scoring your fit.

awsazuregcpkubernetesdockerpytorch

More at Fireworksai

See all open jobs at Fireworksai