
Timeline
About me
Sr. Data Engineer
Education

Velagapudi ramakrishna siddhartha engineering college
-Bachelor of technology - btech computer science
Rutgers university
-Master's degree computer science
Experience

Cube it innovations private limited
Jun 2016 - Dec 2017Hadoop consultantI developed Sqoop scripts for change data capture, processing incremental records between new and existing RDBMS data, and loaded aggregated data into Oracle from Hadoop using Sqoop for dashboard reporting. I created Hive scripts to analyze and process large datasets, designing clusters for cross-examination in Hive and MapReduce jobs. I collaborated with the DevOps team to design end-to-end workflows, utilizing Oozie for Hadoop job automation. I used Impala for querying Hadoop data stored in HDFS and worked with NoSQL databases like HBase. I implemented various Azure platforms, including Azure SQL Database, Azure Data Warehouse, Azure Analysis Services, HDInsight, Data Lake, and Data Factory, extracting and loading data into Azure Data Lake via Sqoop for business access. I developed PIG scripts to transform raw data into business-ready insights. I used Apache Spark with Python for big data analytics and machine learning applications, leveraging Spark ML and MLlib. I analyzed large datasets to optimize aggregation and reporting. I wrote MapReduce programs in Java for data extraction, transformation, and aggregation from multiple file formats such as XML, JSON, and CSV. Show less

Greenbyte technologies, llc
Jan 2018 - Jul 2020Big data engineerI was involved in the complete Big Data flow of the application, starting from data ingestion into HDFS, processing, and analysis. I built ETL data pipelines using Apache Airflow in GCP and created Hive tables to import large datasets from relational databases via Sqoop, enabling BI teams to generate reports. I developed batch ingestion processes for CSV files and utilized Sqoop for data transfers, leveraging GCP services like Dataproc, GCS, Cloud Functions, and BigQuery. I collaborated with business process managers to transform vast data volumes and generate business intelligence reports using Hive, Spark, Sqoop, and NiFi. I conducted real-time credit card fraud analysis with Spark (SparkSQL, MLlib) and AWS, implementing partitioning, dynamic partitions, and bucketing in Hive. I used Amazon S3 for Hadoop data storage, migrated data from Oracle and MySQL into HDFS using Sqoop, and performed Big Data integration using Hadoop, Solr, Spark, Kafka, and Storm. I worked with NoSQL databases like Cassandra and MongoDB, leveraging HBase for processing and data transfer. Additionally, I developed ETL workflows integrating Informatica PowerCenter with Hadoop HDFS, optimizing workflow performance and scalability. I continuously monitored and managed Hadoop clusters via Cloudera Manager, performed Hive optimizations, diagnosed and resolved performance issues, and contributed throughout the software development lifecycle. Show less

Intralot
Jan 2023 - Jul 2024Data engineerIn this project, I developed scalable ETL pipelines and real-time data processing solutions using Big Data and cloud platforms. I worked with Hadoop stack components like Hive, Pig, HBase, and Sqoop, writing MapReduce programs to process data from multiple sources. Using Python, I built interactive web-based solutions and leveraged GCP services such as Dataproc, BigQuery, and Cloud Storage for data processing. I implemented ETL pipelines in Azure using Data Factory, Spark SQL, and U-SQL, ingesting data into Azure Data Lake, Azure SQL, and Data Warehouse. I optimized PySpark scripts for large-scale data processing and automated infrastructure deployment with Terraform. Additionally, I integrated Airflow workflows, scheduled jobs with Oozie, and executed Hive queries for AWS S3-based data storage. I implemented real-time streaming solutions using Databricks Delta Lake and Kafka while designing Tableau dashboards for data visualization. My role also involved working with Docker, Kubernetes, and OpenShift for containerized deployments, integrating Snowflake with Talend ETL, and building Python RESTful APIs for real-time analytics. Throughout the project, I ensured efficient data pipelines, optimized performance, and integrated multi-cloud environments, utilizing Python, Hadoop, Spark, Cassandra, Azure, GCP, AWS, and Snowflake. Show less

Entergy
Aug 2024 - nowSenior data engineerIn this project, I developed real-time data processing pipelines using Spark Streaming, Kafka, and Scala, storing processed data in Cassandra for high-performance querying. I leveraged Python and PySpark for efficient data ingestion and transformation in a Hadoop/Hive environment. I built ETL data pipelines using HDFS, Hive, Presto, Apache Nifi, Sqoop, Spark, ElasticSearch, and Kafka, optimizing data processing and analytics workflows. Additionally, I migrated an on-premises application to Azure, implementing Azure Data Lake, Azure Data Factory, and Azure SQL solutions for seamless data integration. I also worked with Google Cloud Platform (GCP) to build data pipelines in Airflow and utilized BigQuery, DataProc, and Cloud Functions for scalable data processing. My responsibilities included developing batch processing solutions in Azure Databricks, optimizing Snowflake queries, and designing high-performance ETL workflows. I also automated data ingestion and processing using Kafka, Flume, and Sqoop, ensuring efficient data movement into HDFS, Hive, and HBase. Furthermore, I implemented serverless data pipelines using Azure Functions and AWS Glue, integrated Jenkins for CI/CD, and visualized insights using Tableau to support business decision-making. Show less
Licenses & Certifications
- View certificate

Name: introduction to probability and data with r issuing organization: coursera issue
CourseraJun 2021
Recommendations

Max stevens
Digital Marketing Agency Co-FounderIpswich, England, United Kingdom
Babu rajendra kanth sidagam
Manager - Technical IT at Bupa Arabia.Hyderabad, Telangana, India
Dale r.
Senior Biologist/Project Manager at ICFNew Orleans, Louisiana, United States
Victor andres vega yubrán
MSc. in Building Construction Management / Project Manager / Cost Engineer / BIM SpecialistBarcelona, Catalonia, Spanyol
Carly hubble
Program Manager- Strategic Development at SHI International Corp.Beaufort, South Carolina, United States
Carezza wolff
Staff Planning at IKEAMiddletown, Delaware, United States
Riccardo sala
Marketing Manager Healthcare at EcolabMilan, Lombardy, Italy
James mackrill
International business development - USA, Ireland, Asia, Africa, Middle East, UK, Europe. mcim, mbaUnited Kingdom
Kim munro
Rig Manager looking for new opportunitiesRed Deer County, Alberta, Canada
Macmillar barmoore busack
MICB, RCA, CATManila, National Capital Region, Philippines
...