
Type: Long-term contract
We are building a core data platform for a high-growth e-commerce company. The team needs to move from fragmented scripts and dashboards to a unified, automated, and trusted data foundation to support personalization and real-time analytics.
Responsibilities:
Design and build automated CI/CD pipelines for data transformations, ETL/ELT, and ML model training.
Implement a robust framework for data quality testing, validation, and proactive monitoring.
Develop and maintain infrastructure-as-code templates for data pipeline orchestration and environment management.
Establish and automate metadata collection, data lineage tracking, and pipeline observability.
Create standards and tools to enable self-service data pipeline deployment for analytics and data science teams.
Requirements:
Experience in building, automating, and maintaining data pipelines (5+ years).
Experience with Python and SQL for engineering tasks.
Experience with orchestration tools (Airflow, Dagster, Prefect) and modern data stack components.
Proven track record of implementing data quality checks and testing in a CI/CD context.
Experience with infrastructure-as-code (Terraform, CloudFormation) and CI/CD platforms (GitLab CI, GitHub Actions).
Differentiator:
Practical experience implementing a DataOps methodology or internal data platform.
Knowledge of data discovery and lineage tools (DataHub, Amundsen).
Nice to have:
Experience with Snowflake or BigQuery.
Familiarity with Streamlit for building simple data apps.