Member of Technical Staff - Post-Training
Posted 2026-05-06
Remote, USA
Full-time
Immediate Start
Microsoft is dedicated to advancing post-training methods for AI models and is seeking a highly skilled AI Data & Training Technical Staff to join their team. In this role, you will be involved in creating world-class datasets, training models, and developing scalable data pipelines that impact cutting-edge language and multimodal models.
Responsibilities
- Design & Evaluate Datasets – Build high-quality datasets and benchmarks for training AI models; run ablation studies to measure impact and optimize data effectiveness
- Advance Model Training – Apply deep expertise in pre-training, post-training, and reinforcement learning (RL) for both language and multimodal models
- Develop Data Infrastructure – Create and maintain scalable pipelines for ingestion, preprocessing, filtering, and annotation of large, complex datasets
- Data Quality & Analysis – Assess real-world multimodal datasets (text, image, video, audio, code) for quality, diversity, and relevance; identify gaps and propose improvements
- Tooling & Workflows – Build lightweight tools for dataset auditing, visualization, and versioning to streamline experimentation
- Research & Innovation – Collaborate with cross-functional teams to push research and product boundaries, delivering models that make a real-world impact
Skills
- Bachelor's Degree (complete or in progress) in relevant field AND 3+ months related research internship experience OR Master's Degree in relevant field OR equivalent experience
- Software engineering skills with fluency in Python and modern data libraries
- The ability to meet Microsoft, customer and/or government security screening requirements are required for this role
- Master's Degree in relevant field AND 1+ year(s) related research experience OR equivalent experience
- Coding expertise in Python and data libraries (Pandas, NumPy, etc.)
- Proficiency with distributed data frameworks (Spark, Ray, Apache Beam) and cloud ecosystems (Azure, data lakes)
- Hands-on experience with large-scale, unstructured or semi-structured datasets: images, video, audio, and code
- Proven experience training AI models at significant scale
- Demonstrated ability to collaborate within interdisciplinary teams and communicate complex, multimodal research concepts effectively
Company Overview
Company H1B Sponsorship