Shubham Maurya

Shubham Maurya

Data Engineer | Python, SQL, Spark, ETL

Followers of Shubham Maurya741 followers
location of Shubham MauryaDelhi, India

Connect with Shubham Maurya to Send Message

Connect

Connect with Shubham Maurya to Send Message

Connect
  • Timeline

  • About me

    Data Engineer @Ecom Express | Python | Spark | SQL | ETL | DTU'22

  • Education

    • Delhi Technological University (Formerly DCE)

      2018 - 2022
      Bachelor of Technology - BTech
  • Experience

    • Ecom Express Limited

      Jun 2022 - now
      Data Engineer | Python, SQL, Spark, ETL

      Worked on optimizing data pipelines on AWS cloud services (EC2, EMR, Glue, Athena). Used Python, Spark, and SQL for data processing. Explored open-source technologies like Hudi, Airflow, Kafka, Debezium to streamline data workflows.- Designed warehouse pipeline to process micro-batches of real time fact tables and upsert latest data into the warehouse processed layer, while ensuring data transactions are ACID in nature. - Designed pipeline to incrementally bulk ingest multiple MySQL tables of 7 year from MySQL machines to Apache Hudi Lakehouse stored in S3. - Utilized AWS Lambda for file-based triggers and automated archival of older data of Data Lake on s3, ensuring data integrity and reduce cost of storage.- Developed Spark pipelines for real-time data processing from Kafka to S3, enhancing data analysis capabilities. - Designed Airflow DAGs to provision EMR clusters and orchestrate Spark job execution for batch data processing workflows on historical data. - Established robust monitoring jobs for Spark applications, implementing alerting mechanisms for uninterrupted operations. - Developed and optimized production SQL queries, enhancing data retrieval efficiency by 30%.Designed warehouse pipeline to process micro-batches of real time fact tables and upsert latest data into the warehouse processed layer, while ensuring data transactions are ACID in nature. Designed pipeline to incrementally bulk ingest multiple MySQL tables of 7 year from MySQL machines to Apache Hudi Lakehouse stored in S3. Utilized AWS Lambda for file-based triggers and automated archival of older data of Data Lake on s3, ensuring data integrity and reduce cost of storage. Show less

  • Licenses & Certifications