Jashwanth Dokka

Data Engineer Intern

894 followers

Bridgeport, Connecticut, United States

Connect with Jashwanth Dokka to Send Message

Connect

Connect with Jashwanth Dokka to Send Message

Connect

Timeline
About me
Actively seeking for Fulltime | Experienced Data Engineer | AWS Certified | ETL Expert | Cloud Data Solutions | PySpark | AWS Glue | RDS | Redshift | Snowflake |Informatica power center | IICS
Education
- PRASAD V POTLURI SIDDHARTHA INSTITUTE OF TECHNOLOGY
  2015 - 2019
  Bachelor of Technology - BTech Information Technology
- Pace University - Seidenberg School of Computer Science and Information Systems
  2023 - 2025
  Master's degree Computer Science
Experience
- Adani Group
  Feb 2018 - Mar 2019
  Data Engineer Intern
  ● Developed and Refined data processing workflows using Hadoop ecosystem components such as HDFS, MapReduce, and Hive, enabling batch processing and analytics on 50TB+ datasets.● Led the development of end-to-end data pipelines in Azure, incorporating services like Azure Data Factory, Azure Databricks and automating 95% of ETL jobs.● Automated ETL workflows in Informatica, DataStage, and Ab Initio, reducing manual handling by 30% and boosting data processing speed, while enhancing data quality in Snowflake with Airflow integration.● Built backend applications for real-time data integration, automating data loading with Oracle SQL, PL/SQL, T-SQL, and Unix Shell scripting, leading to a 60% improvement in database performance and reducing query execution time by 50%.● Constructed, Improved, and secured CI/CD pipelines using Apache Airflow, Docker, Kubernetes, and SQL, Enhanced deployment speed by 40% and reducing manual intervention by 70%.● Achieved operational cost reduction and improved data processing speed by 20% through pipeline optimizations, while mentoring junior data engineers to ensure team growth and development.● Leverage Snowflake's architecture for external data sharing, supporting data monetization strategies, and collaborated on end-to-end data pipelines, optimizing ETL job performance and modernizing infrastructure, including ETL processes.● Configured and managed Apache Kafka clusters to ensure high availability, fault tolerance, and scalability of data streaming infrastructure, handling terabytes of data per day.● Introduced data visualization solutions with Power BI and Tableau, transforming complex datasets into actionable insights, and contributed to ERDdevelopment while focusing on data security and compliance.● Administered version control systems such as Git for code management and collaboration, ensuring code integrity and facilitating team collaboration. Show less
- Capgemini
  May 2019 - Nov 2021
  Data Engineer/ Senior Software Engineer
  ● Migrated Oracle DB to the Amazon cloud using Amazon DMS with 100% availability, enabling enhanced scalability and cost savings for enterprise data solutions.● Established and accelerated Spark SQL queries and data processing tasks within the Spark ecosystem, Boosted query performance by 50% and enabling the processing of 10TB+ of data 3x faster for large-scale analytics.● Identified and evaluated pertinent data sources, such as product catalogs, inventory databases, and user purchase history, optimizing AWS data storage and retrieval via services like Amazon S3 and RDS.● Crafted and Deployed Azure SQL Database solutions to manage datasets exceeding 20 GB, reducing query latency by 15%.● Played a pivotal role in designing and deploying cloud-based infrastructure solutions for data pipelines, leveraging AWS services such as CloudFormation, RDS, EC2, Lambda, Athena, S3, DynamoDB, SQS, Redshift, SNS, AWS Glue, CloudWatch, and IAM, with proficiency in Python and PySpark.● Spearheaded the development of an end-to-end data engineering pipeline, from data source identification to schema development, API integration, and data transformation, leveraged AWS services like Amazon S3, RDS, and Glue.● Engineered an efficient data storage and processing system, optimizing algorithms for item buy ability, and increasing data accuracy by 98% while seamlessly integrating with AWS services such as EMR and Lambda. Show less
- Cognizant
  Nov 2021 - Jan 2023
  Data Engineer/ Programmer Analyst
  ● Migrated on SQL Server database to AWS cloud, ensuring a 99.99% seamless transition with high availability of critical data systems.● Applied and Refined ETL processes in Snowflake, leveraging SQL, Snowpark and Snowflake's features to ensure 100% seamless data extraction, transformation, and loading.● Elevated query performance by 30% through in-depth Performance optimization, including utilization of Snowflake's advanced SQL features.● Architected, and supervised a robust data warehousing solution using Snowflake, enabling the organization to store and analyze 50TB+ of data.● Formulated new data validation processes using Python and PySpark to ensure integrity and data quality by 99%, conducting data profiling and cleansing activities.● Built, tested, and administered data management systems using Apache Spark and Apache Hadoop ecosystem tools such as Hive and HDFS, enabling efficient data processing speed to 3x for large scale analytics.● Applied data security and governance measures with modern-day security controls such as encryption and access controls, ensuring 100% compliance with data privacy regulations.● Created real-time data ingestion pipelines using Snowpipe, enabling near-real-time access to critical data, reducing data latency from 24 hours to under 5 minutes, while also implementing Disaster Recovery and High Availability strategies.● Upgraded data management efficiency and accuracy through proficient utilization of Snowflake features including data retention, Time Travel, clustering, and materialized views. Executed data masking and fail-safe mechanisms to ensure 100% data integrity.● Revamped decision-making processes by leveraging Snowflake functionalities such as data sharing, scheduling tasks. Evaluate features like clones and streams to facilitate 100% seamless data operations and enhance overall system reliability. Show less
- Blue Cross Blue Shield Association
  Feb 2025 - now
  Data Engineer
  ● Implemented data ingestion and processing solutions using Big Data technologies like Hadoop, Spark, and Hive, handling over 2 million records monthly and optimizing workflows for Augmented efficiency.● Engineered and optimized ETL pipelines using Azure services such as Data Lake, Synapse Analytics, Data Factory, and Databricks, ensuring seamless data integration, transformation, and storage.● Amplified data quality and reporting by optimizing SQL queries, resolving discrepancies, applying Snowflake schema modeling, and creating Power BI dashboards for actionable business insights.● Collaborated with senior management and IT teams to define business requirements, lead data-driven initiatives, and drive project success through effective communication and execution. Show less
Licenses & Certifications
- AWS Certified Data Engineer – Associate
  Amazon Web Services (AWS)
  Dec 2024
  View certificate
- AWS Certifications
  Coursera
  Jul 2021
  View certificate