VLM Research Engineer (m/f/d)

mid

via Ashby

See if I'm a fit →Tailor my resume for this role →Apply on Ashby ↗

About this role

We’re looking for a Research Engineer to push the limits of vision-language models for real-world video understanding. You’ll work on applied, state-of-the-art multimodal models and turn them into production pipelines used by customers. YOUR ROLE - Design and adapt vision-language and video models for scene understanding, temporal reasoning and activity / action recognition - Build and maintain large-scale training and evaluation pipelines on GPU clusters - Curate and augment video-text and action datasets, including synthetic labels and retrieval-based augmentation - Develop robust benchmarks for video QA, instruction following and temporal understanding, and use them to drive iterative model improvements…

Read the full description on Deltia's site →

What we'd score you on

reqspace match rubric

Five dimensions, recruiter-grade. Upload your resume and we'll generate a written explanation of where you fit and where the gaps are.

Skills match

For this role: python, pytorch, teams

Level fit

This role is mid-level. We check your trajectory against it.

Domain experience

Your work in the role's domain matters more than your years total. We weight recent and direct experience.

Recency

A skill you used last quarter weighs more than one from five years ago. We grade on recency, not lifetime.

Location fit

This role is based in a specific location. We weight your proximity and willingness to relocate.

Score yourself on this role.

Free · no card · written explanation included

See if I'm a fit →

Skills in this role

Pulled from the job description. These are the keywords we'll weight when scoring your fit.

pythonpytorchteams

More at Deltia

See all open jobs at Deltia →