Jahnavi M

Plano, Texas, United States

+91 xxxx xxxxx

[email protected]

582 followers

Timeline
Jun 2016 - Dec 2017
Hadoop consultant
CUBE IT INNOVATIONS PRIVATE LIMITED
Jan 2018 - Jul 2020
Big Data Engineer
GreenByte Technologies, LLC
Jan 2023 - Jul 2024
Data Engineer
Intralot
Current Company
Aug 2024 - now
Senior Data Engineer
Entergy
About me
Sr. Data Engineer
Education
- Velagapudi ramakrishna siddhartha engineering college
  -
  Bachelor of technology - btech computer science
- Rutgers university
  -
  Master's degree computer science
Experience
- Cube it innovations private limited
  Jun 2016 - Dec 2017
  Hadoop consultant
  I developed Sqoop scripts for change data capture, processing incremental records between new and existing RDBMS data, and loaded aggregated data into Oracle from Hadoop using Sqoop for dashboard reporting. I created Hive scripts to analyze and process large datasets, designing clusters for cross-examination in Hive and MapReduce jobs. I collaborated with the DevOps team to design end-to-end workflows, utilizing Oozie for Hadoop job automation. I used Impala for querying Hadoop data stored in HDFS and worked with NoSQL databases like HBase. I implemented various Azure platforms, including Azure SQL Database, Azure Data Warehouse, Azure Analysis Services, HDInsight, Data Lake, and Data Factory, extracting and loading data into Azure Data Lake via Sqoop for business access. I developed PIG scripts to transform raw data into business-ready insights. I used Apache Spark with Python for big data analytics and machine learning applications, leveraging Spark ML and MLlib. I analyzed large datasets to optimize aggregation and reporting. I wrote MapReduce programs in Java for data extraction, transformation, and aggregation from multiple file formats such as XML, JSON, and CSV. Show less
- Greenbyte technologies, llc
  Jan 2018 - Jul 2020
  Big data engineer
  I was involved in the complete Big Data flow of the application, starting from data ingestion into HDFS, processing, and analysis. I built ETL data pipelines using Apache Airflow in GCP and created Hive tables to import large datasets from relational databases via Sqoop, enabling BI teams to generate reports. I developed batch ingestion processes for CSV files and utilized Sqoop for data transfers, leveraging GCP services like Dataproc, GCS, Cloud Functions, and BigQuery. I collaborated with business process managers to transform vast data volumes and generate business intelligence reports using Hive, Spark, Sqoop, and NiFi. I conducted real-time credit card fraud analysis with Spark (SparkSQL, MLlib) and AWS, implementing partitioning, dynamic partitions, and bucketing in Hive. I used Amazon S3 for Hadoop data storage, migrated data from Oracle and MySQL into HDFS using Sqoop, and performed Big Data integration using Hadoop, Solr, Spark, Kafka, and Storm. I worked with NoSQL databases like Cassandra and MongoDB, leveraging HBase for processing and data transfer. Additionally, I developed ETL workflows integrating Informatica PowerCenter with Hadoop HDFS, optimizing workflow performance and scalability. I continuously monitored and managed Hadoop clusters via Cloudera Manager, performed Hive optimizations, diagnosed and resolved performance issues, and contributed throughout the software development lifecycle. Show less
- Intralot
  Jan 2023 - Jul 2024
  Data engineer
  In this project, I developed scalable ETL pipelines and real-time data processing solutions using Big Data and cloud platforms. I worked with Hadoop stack components like Hive, Pig, HBase, and Sqoop, writing MapReduce programs to process data from multiple sources. Using Python, I built interactive web-based solutions and leveraged GCP services such as Dataproc, BigQuery, and Cloud Storage for data processing. I implemented ETL pipelines in Azure using Data Factory, Spark SQL, and U-SQL, ingesting data into Azure Data Lake, Azure SQL, and Data Warehouse. I optimized PySpark scripts for large-scale data processing and automated infrastructure deployment with Terraform. Additionally, I integrated Airflow workflows, scheduled jobs with Oozie, and executed Hive queries for AWS S3-based data storage. I implemented real-time streaming solutions using Databricks Delta Lake and Kafka while designing Tableau dashboards for data visualization. My role also involved working with Docker, Kubernetes, and OpenShift for containerized deployments, integrating Snowflake with Talend ETL, and building Python RESTful APIs for real-time analytics. Throughout the project, I ensured efficient data pipelines, optimized performance, and integrated multi-cloud environments, utilizing Python, Hadoop, Spark, Cassandra, Azure, GCP, AWS, and Snowflake. Show less
- Entergy
  Aug 2024 - now
  Senior data engineer
  In this project, I developed real-time data processing pipelines using Spark Streaming, Kafka, and Scala, storing processed data in Cassandra for high-performance querying. I leveraged Python and PySpark for efficient data ingestion and transformation in a Hadoop/Hive environment. I built ETL data pipelines using HDFS, Hive, Presto, Apache Nifi, Sqoop, Spark, ElasticSearch, and Kafka, optimizing data processing and analytics workflows. Additionally, I migrated an on-premises application to Azure, implementing Azure Data Lake, Azure Data Factory, and Azure SQL solutions for seamless data integration. I also worked with Google Cloud Platform (GCP) to build data pipelines in Airflow and utilized BigQuery, DataProc, and Cloud Functions for scalable data processing. My responsibilities included developing batch processing solutions in Azure Databricks, optimizing Snowflake queries, and designing high-performance ETL workflows. I also automated data ingestion and processing using Kafka, Flume, and Sqoop, ensuring efficient data movement into HDFS, Hive, and HBase. Furthermore, I implemented serverless data pipelines using Azure Functions and AWS Glue, integrated Jenkins for CI/CD, and visualized insights using Tableau to support business decision-making. Show less
Licenses & Certifications
- Name: introduction to probability and data with r issuing organization: coursera issue
  Coursera
  Jun 2021
  View certificate