Descripción del puesto

  • Analysis, Design, implementation, and maintenance pipelines that produce business-critical data reliably and efficiently using cloud technologies.
  • Develop new ETLs (Extract, Transform, Load), using the current Apache Airflow. Propose new initiatives to improve performance, scalability, reliability, and overall robustness.
  • Collect, process, and clean data from different sources using Python & SQL.
  • Work side by side with the main Architects and Developers to create and ensure best practices and guidelines are being used properly by all projects.
  • Assess and communicate effectively the effort for required developments.
  • Discover new data sources to improve new and existing data pipelines. Be in charge of building and maintaining data pipelines and data models for new and existing projects.
  • Maintain detailed documentation of your work and changes to support data quality and governance.
  • Provide feedback and expert points of view as needed to help all data initiatives in the company.
  • Improve the quality of existing and new data processes (ETL), incorporating statistical process control, and creating alerts when anomalies are received from data sources at every step of the pipeline.
  • Create benchmark control of execution times for every pipeline, to control and identify potential availability issues.

Requisitos

  • 5+ years of experience as a Data Engineer, ML Engineer, or similar.
  • 4+ years of experience using Python object-oriented programming and scripting.
  • Strong experience creating data pipelines and ETL processes in modern technologies using Apache Airflow as the main workflow management tool.
  • 4+ years of experience with the AWS data ecosystem (S3, Glue, Athena, Redshift, Lambda, EC2, RDS ...)
  • Experience with large-scale data and query optimization techniques using SQL.
  • Experience managing and monitoring data lakes systems.
  • Strong experience using source control management and CI/CD (GitHub actions/Gitlab pipelines).

    Nice to have:
  • Experience with containerized (Docker, Kubernetes) environments in AWS using EKS/ECS and ECR.
  • Knowledge and hands-on on cloud stream-processing systems.
  • Hand-on IAC solutions like Terraform or CloudFormation.

Beneficios

-