Member of Technical Staff (AI Inference Engineer)

mid

via Ashby

See if I'm a fit →Tailor my resume for this role →Apply on Ashby ↗

About this role

We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer to join us. WHAT YOU WILL WORK ON Examples of real work the team does: - New models support. Support transformer-based retrieval, text-generation, and multimodal models in our inference infrastructure, from weight loading, request scheduling and KV-cache management to support in API Gateway. - GPU kernels migration to CuTe DSL. Port our in-house CUDA kernels to NVIDIA's CuTe DSL so they run on GB200 today and are portable to Vera Rubin racks tomorrow. - Rust-native serving runtime.…

Read the full description on Perplexity's site →

What we'd score you on

reqspace match rubric

Five dimensions, recruiter-grade. Upload your resume and we'll generate a written explanation of where you fit and where the gaps are.

Skills match

For this role: python, rust, sass, kubernetes, pytorch…

Level fit

This role is mid-level. We check your trajectory against it.

Domain experience

Your work in the role's domain matters more than your years total. We weight recent and direct experience.

Recency

A skill you used last quarter weighs more than one from five years ago. We grade on recency, not lifetime.

Location fit

This role is based in a specific location. We weight your proximity and willingness to relocate.

Score yourself on this role.

Free · no card · written explanation included

See if I'm a fit →

Skills in this role

Pulled from the job description. These are the keywords we'll weight when scoring your fit.

pythonrustsasskubernetespytorchtensorflowjaxmonday

More at Perplexity

See all open jobs at Perplexity →