Data Engineer: Scalable Pipelines for ML Workflows
Posted 2026-05-06
Remote, USA
Full-time
Immediate Start
- Roles and Responsibility -
- Design, build, and maintain scalable and reliable data pipelines for dataset creation, transformation, and benchmarking
- Own and optimize Airflow pipelines on AWS for data processing, orchestration, and evaluation workflows
- Write efficient, production-grade SQL and Python code for large-scale data processing and analysis
- Partner closely with ML engineers to enable model training, evaluation, and benchmarking pipelines
- Improve pipeline performance, reliability, and observability, ensuring high data quality in production
- Build and maintain systems to support model performance tracking and data drift monitoring
- Troubleshoot and resolve data issues across pipelines, ensuring minimal impact on ML workflows
- Contribute to data architecture decisions and best practices across the platform
- Collaborate cross-functionally with ML, platform, and data teams to support scalable ML infrastructure
- What Were Looking For
- 35 years of experience in Data Engineering, Data Platforms, or related roles
- Strong proficiency in Python and SQL with experience in production systems
- Hands-on experience with AWS services (S3, EC2, SageMaker or similar)
- Solid experience building and managing Airflow (or similar orchestration tools)
- Strong understanding of data engineering fundamentals (ETL/ELT, data modeling, pipeline design)
- Experience working with large-scale datasets and distributed data systems
- Experience supporting ML workflows, datasets, or evaluation pipelines
- Strong problem-solving skills and ability to work independently in a fast-paced environment
- Nice to Have
- Experience with ML infrastructure, MLOps, or model evaluation workflows
- Exposure to biometric systems or computer vision datasets
- Familiarity with data quality frameworks, monitoring, and observability tools
- Experience working in SaaS or high-scale production environments