Member of Technical Staff (AI Inference Engineer)
mid
via Ashby
About this role
We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer to join us.
WHAT YOU WILL WORK ON
Examples of real work the team does:
- New models support. Support transformer-based retrieval, text-generation, and multimodal models in our inference infrastructure, from weight loading, request scheduling and KV-cache management to support in API Gateway.
- GPU kernels migration to CuTe DSL. Port our in-house CUDA kernels to NVIDIA's CuTe DSL so they run on GB200 today and are portable to Vera Rubin racks tomorrow.
- Rust-native serving runtime.…
What we'd score you on
reqspace match rubricFive dimensions, recruiter-grade. Upload your resume and we'll generate a written explanation of where you fit and where the gaps are.
1
Skills match
For this role: python, rust, sass, kubernetes, pytorch…
2
Level fit
This role is mid-level. We check your trajectory against it.
3
Domain experience
Your work in the role's domain matters more than your years total. We weight recent and direct experience.
4
Recency
A skill you used last quarter weighs more than one from five years ago. We grade on recency, not lifetime.
5
Location fit
This role is based in a specific location. We weight your proximity and willingness to relocate.
Score yourself on this role.
Free · no card · written explanation included
Skills in this role
Pulled from the job description. These are the keywords we'll weight when scoring your fit.
pythonrustsasskubernetespytorchtensorflowjaxmonday
