Design, build, and maintain scalable and reliable data pipelines for dataset creation, transformation, and benchmarking
Own and optimize Airflow pipelines on AWS for data processing, orchestration, and evaluation workflows
Write efficient, production-grade SQL and Python code for large-scale data processing and analysis
Partner closely with ML engineers to enable model training, evaluation, and benchmarking pipelines
Improve pipeline performance, reliability, and observability, ensuring high data quality in production
Build and maintain systems to support model performance tracking and data drift monitoring
Troubleshoot and resolve data issues across pipelines, ensuring minimal impact on ML workflows
Contribute to data architecture decisions and best practices across the platform
Collaborate cross-functionally with ML, platform, and data teams to support scalable ML infrastructure

35 years of experience in Data Engineering, Data Platforms, or related roles
Strong proficiency in Python and SQL with experience in production systems
Hands-on experience with AWS services (S3, EC2, SageMaker or similar)
Solid experience building and managing Airflow (or similar orchestration tools)
Strong understanding of data engineering fundamentals (ETL/ELT, data modeling, pipeline design)
Experience working with large-scale datasets and distributed data systems
Experience supporting ML workflows, datasets, or evaluation pipelines
Strong problem-solving skills and ability to work independently in a fast-paced environment

Data Engineer: Scalable Pipelines for ML Workflows

Similar Jobs