Machine Learning Systems Research Engineer, Agent Post-training - Enterprise GenAI
Posted 2026-05-06
Remote, USA
Full-time
Immediate Start
Scale AI is a leading AI data foundry focused on accelerating the development of AI applications. The ML Systems Research Engineer will build and optimize algorithms for a next-gen Agent RL training platform, collaborating with teams to enhance model training and deployment for various enterprise applications.
Responsibilities
- Build, profile and optimize our training and inference framework
- Post-train state of the art models, developed both internally and from the community, to define stable post-training recipes for our enterprise engagements
- Collaborate with ML teams to accelerate their research and development, and enable them to develop the next generation of models and data curation
- Create a next-gen agent training algorithm for multi-agent/multi-tool rollouts
Skills
- At least 1-3 years of LLM training in a production environment
- Passionate about system optimization
- Experience with post-training methods like RLHF/RLVR and related algorithms like PPO/GRPO etc
- Ability to demonstrate know-how on how to operate the architecture of the modern GPU cluster
- Experience with multi-node LLM training and inference
- Strong software engineering skills, proficient in frameworks and tools such as CUDA, Pytorch, transformers, flash attention, etc
- Strong written and verbal communication skills to operate in a cross functional team environment
- PhD or Masters in Computer Science or a related field
Benefits
- Comprehensive health, dental and vision coverage
- Retirement benefits
- A learning and development stipend
- Generous PTO
- Commuter stipend
Company Overview
Company H1B Sponsorship