Innodata (Nasdaq: INOD) is a global data engineering company. We believe that data and Artificial Intelligence (AI) are inextricably linked. Our mission is to enable the responsible advancement of artificial intelligence by providing the data, evaluation frameworks, and human expertise required to build AI systems that can be trusted at scale. We provide a range of transferable solutions, platforms, and services for Generative AI / AI builders and adopters. In every relationship, we honor our 36+ year legacy delivering the highest quality data and outstanding outcomes for our customers.

Scope of the Role:

We are looking for a curious and driven Data Engineering Intern to join our Data & AI team. You will primarily focus on building and maintaining robust data pipelines and infrastructure, while also contributing to applied AI projects involving Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems.

This is a hands-on role. You will work alongside senior engineers and data scientists, contribute to production-grade systems.

The role is roughly 65% Data Engineering and 35% Data Science / Applied AI.

What You’ll Own:

Data Engineering

Design, build, and maintain scalable ETL/ELT data pipelines using tools like Apache Airflow, dbt, or Spark

Work with structured and unstructured data from various sources — APIs, databases, event streams

Write optimized SQL queries and data transformation logic for analytical and ML use cases

Maintain and improve data quality, schema management, and pipeline monitoring

Collaborate on data warehouse and data lake architecture (e.g., Snowflake, BigQuery, Delta Lake)

Document data flows, lineage, and schema definitions

Data Science & Applied AI

Build and evaluate RAG pipelines — chunking, embedding, indexing, and retrieval

Work with vector databases (e.g., Pinecone, Weaviate, pgvector) for semantic search

Integrate LLM APIs (OpenAI, Anthropic, open-source models) into data products or internal tools

Help with prompt engineering, evaluation frameworks, and fine-tuning experiments

Support exploratory data analysis and feature engineering for ML workflows

You’ll Thrive in This Role If You Have:

Pursuing a degree in Computer Science, Data Science, Engineering, or a related field

Solid foundation in Python — comfortable writing clean, modular, production-quality code

Hands-on experience with SQL (query optimization, CTEs, window functions)

Familiarity with at least one cloud platform — AWS, GCP, or Azure

Understanding of data pipeline concepts: batch vs streaming, orchestration, idempotency

Strong analytical mindset with attention to data quality and correctness

Experience with workflow orchestrators: Apache Airflow, Prefect, or Dagster

Exposure to dbt for data transformation and testing

The expected hourly range for this position is $20/hour.

Please be aware of recruitment scams involving individuals or organizations falsely claiming to represent employers. Innodata will never ask for payment, banking details, or sensitive personal information during the application process. To learn more on how to recognize job scams, please visit the Federal Trade Commission’s guide at https://consumer.ftc.gov/articles/job-scams.

If you believe you’ve been targeted by a recruitment scam, please report it to Innodata at [email protected] and consider reporting it to the FTC at ReportFraud.ftc.gov.

Events & Community Growth Intern

Similar Jobs

Recent Jobs

You May Also Like