As an experienced Data Engineer, I have a proven track record of designing and implementing robust ETL pipelines, optimizing data warehouse solutions, and championing DevOps methodologies. With over 5 years of hands-on experience, I specialize in building scalable data platforms tailored to the needs of established corporations and fast-paced startups alike. I’m also proficient in various cloud environments such as Google Cloud Platform (GCP) and Amazon Web Services (AWS).
My expertise extends to orchestrating complex data workflows using tools like Apache Airflow and Kubernetes, ensuring data integrity and maintaining quality control throughout the process. Furthermore, I possess advanced skills in Python programming, PySpark, SQL, automation scripting, Infrastructure as Code (Terraform), coupled with a strong commitment to continuous improvement and innovation in data infrastructure, all while incorporating DevOps best practices.
Experience
Contractbook
Apr 2022 – Jan 2024
Copenhagen, Denmark
Data Engineer
● Created whole data platform from scratch starting from infrastructure, custom ETL pipeline, jobs orchestration and data warehouse in Google Cloud Platform (Services used: BigQuery, Cloud Storage, Kubernetes Engine, Cloud SQL, Cloud Functions,
Compute Engine, IAM, Pub/Sub, DataProc)
● Reduced infrastructure and data operational cost from 3900 USD/month to 300 USD
● Decreased Data Warehouse build time from 5 hours to ~7 minutes
● Managed Data Warehouse using DBT
● Maintained data lake using Google Cloud Storage
● Developed deployment processes using Terraform
● Managed data platform jobs and tools on Kubernetes
● Established and managed streaming data using Debezium, Kafka and PySpark
● Automated data integration via REST API, Python, and Bash scripts
● Orchestrated job management with Airflow
● Designed dashboards utilizing Metabase
● Established and managed Data Warehouse on Snowflake (self development)
● Managed CI/CD processes on Gitlab
Unite
Mar 2021 – Apr 2022
Leipzig, Germany
Data Engineer
● Created and maintained data platform including custom ETL, orchestration, infrastructure and data warehouse in AWS environment (Services used: Redshift, RDS, Glue, Athena, S3, Lambda, Kinesis, Cloudformation, IAM, ECS, EC2, VPC, DMS, Cloudwatch)
● Established efficient custom ETL frameworks using Python and PySpark, built on top of AWS services
● Migrated legacy ETLs from Pentaho to custom solution, which also increased the flexibility and reduced the execution time
● Maintained data transformation in Data Warehouse using DBT
● Implemented deployment management with Cloudformation and Codepipeline
● Established and maintained streaming data from Postgres using DMS and Kinesis
● Maintained data lake using S3
● Managed jobs orchestration using Dagster
● Designed dashboards with Redash and Tableau
● Managed CI/CD processes on Gitlab
Gojek
Jun 2019 – Mar 2021
Jakarta, Indonesia
Data Warehouse Engineer
● Constructed and managed data warehouse on GCP environment (Services used: BigQuery, Cloud Storage, Kubernetes Engine, Compute Engine, IAM, Pub/Sub,
Cloud Registry)
● Created a standalone platform to ingest data from BigQuery to any API
● Standardized existing data warehouse from the naming convention, data type, data retention and data quality control
● Developed ETL frameworks using Docker, Python, and Bash for diverse data sources / APIs
● Established Kafka stream pipeline utilizing Debezium and sink to BigQuery
● Managed job orchestration using Airflow
● Implemented Kimball approach to construct data warehouse by using Slowly Changing Dimension tables
● Created data standardization platform utilizing Flask and Vue.js