Mansi Bhardwaj

Mansi Bhardwaj

Followers of Mansi Bhardwaj1000 followers
location of Mansi BhardwajFaridabad, Haryana, India

Connect with Mansi Bhardwaj to Send Message

Connect

Connect with Mansi Bhardwaj to Send Message

Connect
  • Timeline

  • About me

    Data Engineer/Analyst | SQL | Python | Jenkins | Git | Docker | Airflow | Data pipeline | Polars | Duckdb | Pytest Framework | Data Warehousing | Data Modeling

  • Education

    • Maiteryi College, University Of Delhi

      2016 - 2019
      BSc. Mathematics (Honours)
    • Delhi Technological University

      2020 - 2022
      Msc Applied Mathematics
  • Experience

    • Galytix

      Jul 2022 - Aug 2024

      Client : Foyer Group Health (Jan'23 - March'24)Responsibility –• Improved the existing structure of pipeline by adding features in pipeline such as validation and profiling todetect anomaly at early stage.•Validation consists of table and column level analysis. On the other hand, profiling involves comparing currentdata to previous run to check if anomalies present in latest data.•Utilized Object-Oriented programming(OOPs) principal to create data pipeline that follows ETL processesadherence to BDCR(Build, Detect, Correct, Repair) principles.•Creating pipeline run and step information artifacts for end user to determine whether pipeline run issuccessful or not. Automated pipeline execution with Apache Airflow, replacing manual triggers withscheduled runs.•Build test cases to determine the functionality of pipeline and ensuring its reliability.Tech Stack – Git, Linux, Python, Pandas, PyTest, Apache AirflowClient : SocGen (Apr'24 - Present)Responsibility –•Successfully enhanced the performance of data processing pipeline handling 225GB of data.•Utilized Polars to streamline the workflow and significantly reduce processing time from 10 hours to 2.5hours.•Implemented robust handling and logging mechanism to ensure data integrity and facilitate troubleshootingduring the data processing.•Conducted thorough performace testing and benchmarking to identify bottlenecks and optimize the efficiencyof the pipeline further.•Collaborated with cross-functional teams, including data scientists and analysts, to integrate the optimizedpipeline into the border data analytics framework, ensuring seamless operation and usability.Tech Stack – Git, Polars, Python Show less Responsibility –•Perform EDA on semi structured raw data in JSON format using pandas to get the insights and relevantinformation from data.•Interact with client to discuss and propose our understanding of data. Incorporated essential checks such asdata quality score, logging of pipeline runs and step information to enhance efficiency of data pipeline.•Implemented Polars module to optimize the data processing pipeline, resulting in significant reduction inpipeline run time.•Utilizing Git for version control to manage code changes and facilitate collabortion amoung team members.•Employing Linus as the operating environment for running and automating the data processing tasks.Tech Stack : Git, Linux, Python, Pandas, Polars Show less

      • Data Analyst

        Jan 2023 - Aug 2024
      • Data Analyst Intern

        Jul 2022 - Dec 2022
    • Fidelity International

      Aug 2024 - now
      Database Developer - Data Analyst
  • Licenses & Certifications