Research Engineer, Evals

mid$250K$400K

via Ashby

About this role

ROLE At Variance, we are teaching machines to make the hardest judgment calls at scale. That means building AI agents for the high-stakes gray area of risk investigations, fraud, and identity reviews. We’re a small, talent-dense team in San Francisco working on a problem at the edge of what AI systems can reliably do: making good decisions in messy, adversarial, real-world environments. We focus on building, high-consequence systems problems where the edge cases matter most. We’re looking for a Research Engineer to help define how we measure and improve model quality. You’ll build the benchmarks, datasets, tooling, and evaluation loops that tell us whether our systems are actually getting better on the tasks that matter.…

Read the full description on Intrinsic-safety's site →

What we'd score you on

reqspace match rubric

Five dimensions, recruiter-grade. Upload your resume and we'll generate a written explanation of where you fit and where the gaps are.

1

Skills match

For this role: teams

2

Level fit

This role is mid-level. We check your trajectory against it.

3

Domain experience

Your work in the role's domain matters more than your years total. We weight recent and direct experience.

4

Recency

A skill you used last quarter weighs more than one from five years ago. We grade on recency, not lifetime.

5

Location fit

This role is based in a specific location. We weight your proximity and willingness to relocate.

Score yourself on this role.
Free · no card · written explanation included
See if I'm a fit →

Skills in this role

Pulled from the job description. These are the keywords we'll weight when scoring your fit.

teams

More at Intrinsic-safety

See all open jobs at Intrinsic-safety