VLM Research Engineer (m/f/d)
mid
via Ashby
About this role
We’re looking for a Research Engineer to push the limits of vision-language models for real-world video understanding. You’ll work on applied, state-of-the-art multimodal models and turn them into production pipelines used by customers.
YOUR ROLE
- Design and adapt vision-language and video models for scene understanding, temporal reasoning and activity / action recognition
- Build and maintain large-scale training and evaluation pipelines on GPU clusters
- Curate and augment video-text and action datasets, including synthetic labels and retrieval-based augmentation
- Develop robust benchmarks for video QA, instruction following and temporal understanding, and use them to drive iterative model improvements…
What we'd score you on
reqspace match rubricFive dimensions, recruiter-grade. Upload your resume and we'll generate a written explanation of where you fit and where the gaps are.
1
Skills match
For this role: python, pytorch, teams
2
Level fit
This role is mid-level. We check your trajectory against it.
3
Domain experience
Your work in the role's domain matters more than your years total. We weight recent and direct experience.
4
Recency
A skill you used last quarter weighs more than one from five years ago. We grade on recency, not lifetime.
5
Location fit
This role is based in a specific location. We weight your proximity and willingness to relocate.
Score yourself on this role.
Free · no card · written explanation included
Skills in this role
Pulled from the job description. These are the keywords we'll weight when scoring your fit.
pythonpytorchteams
