Member of Technical Staff, Training (Bay Area, Remote)
via Ashby
About this role
WHAT YOU’LL DO
- Drive down wall-clock time to convergence by profiling and eliminating bottlenecks across the foundation model training stack stack, from data pipelines to GPU kernels
- Design, build, and optimize distributed training systems (PyTorch) for multi-node GPU clusters, ensuring scalability, robustness, and high utilization
- Implement efficient low-level code (CUDA, cuDNN, Triton, custom kernels) and integrate it seamlessly into high-level training frameworks
- Optimize workloads for hardware efficiency: CPU/GPU compute balance, memory management, data throughput, and networking
- Develop monitoring and debugging tools for large-scale runs, enabling rapid diagnosis of performance regressions and failures
WHAT YOU’LL BRING…
What we'd score you on
reqspace match rubricFive dimensions, recruiter-grade. Upload your resume and we'll generate a written explanation of where you fit and where the gaps are.
1
Skills match
For this role: python, pytorch
2
Level fit
We check your title trajectory against the seniority signal of the role.
3
Domain experience
Your work in the role's domain matters more than your years total. We weight recent and direct experience.
4
Recency
A skill you used last quarter weighs more than one from five years ago. We grade on recency, not lifetime.
5
Location fit
This role is based in a specific location. We weight your proximity and willingness to relocate.
Score yourself on this role.
Free · no card · written explanation included
Skills in this role
Pulled from the job description. These are the keywords we'll weight when scoring your fit.
pythonpytorch
More at Genesis-ai
- View →Business Development Manager
- View →Member of Technical Staff, Foundation Models (Bay Area)
- View →Member of Technical Staff, Robot Learning (Bay Area)
- View →Member of Technical Staff, Mechanical Engineer (Bay Area)
- View →Member of Technical Staff, Data (Paris, London)
- View →Robot Data Operations Analyst
