Talk to sales Try for Free

Shubham Maurya

Data Engineer | Python, SQL, Spark, ETL

741 followers

Delhi, India

Connect with Shubham Maurya to Send Message

Connect with Shubham Maurya to Send Message

Timeline
About me
Data Engineer @Ecom Express | Python | Spark | SQL | ETL | DTU'22
Education
- Delhi Technological University (Formerly DCE)
  2018 - 2022
  Bachelor of Technology - BTech
Experience
- Ecom Express Limited
  Jun 2022 - now
  Data Engineer | Python, SQL, Spark, ETL
  Worked on optimizing data pipelines on AWS cloud services (EC2, EMR, Glue, Athena). Used Python, Spark, and SQL for data processing. Explored open-source technologies like Hudi, Airflow, Kafka, Debezium to streamline data workflows.- Designed warehouse pipeline to process micro-batches of real time fact tables and upsert latest data into the warehouse processed layer, while ensuring data transactions are ACID in nature. - Designed pipeline to incrementally bulk ingest multiple MySQL tables of 7 year from MySQL machines to Apache Hudi Lakehouse stored in S3. - Utilized AWS Lambda for file-based triggers and automated archival of older data of Data Lake on s3, ensuring data integrity and reduce cost of storage.- Developed Spark pipelines for real-time data processing from Kafka to S3, enhancing data analysis capabilities. - Designed Airflow DAGs to provision EMR clusters and orchestrate Spark job execution for batch data processing workflows on historical data. - Established robust monitoring jobs for Spark applications, implementing alerting mechanisms for uninterrupted operations. - Developed and optimized production SQL queries, enhancing data retrieval efficiency by 30%.Designed warehouse pipeline to process micro-batches of real time fact tables and upsert latest data into the warehouse processed layer, while ensuring data transactions are ACID in nature. Designed pipeline to incrementally bulk ingest multiple MySQL tables of 7 year from MySQL machines to Apache Hudi Lakehouse stored in S3. Utilized AWS Lambda for file-based triggers and automated archival of older data of Data Lake on s3, ensuring data integrity and reduce cost of storage. Show less
Licenses & Certifications
- Basic Statistics
  Coursera
  Jun 2021
  View certificate
- 2021 Python for Machine Learning & Data Science Masterclass
  Udemy
  Nov 2021
  View certificate
- Programming for Everybody (Getting Started with Python)
  Coursera
  Aug 2020
  View certificate
- Python Data Structures
  Coursera
  Aug 2020
  View certificate
- Crash Course on Python
  Coursera
  Jun 2021
  View certificate

Recommendations
...