Satya Sai Teja Jasthi

ETL Developer

443 followers

Tampa, Florida, United States

Connect with Satya Sai Teja Jasthi to Send Message

Connect

Connect with Satya Sai Teja Jasthi to Send Message

Connect

Timeline
About me
Sr Data Engineer
Education
- Bradley University
  -
  Master's degree
- Lovely Professional University
  -
  Bachelor's degree
Experience
- Hexaware Technologies
  Nov 2014 - Apr 2016
  ETL Developer
  Created ETL pipelines to move data from legacy systems to a Hadoop cluster while working on a reporting project. Additionally involved in data pretreatment, data cleansing, business requirement validation, functional specification design for schema and table construction, and Hive DWH query performance optimization.Responsibilities:• Assisted in the distribution of Hortonworks. installed, set up, and kept up a Hadoop cluster in accordance with the needs of the organization.• Using a variety of Informatica Designer tools, including Source Analyzer, Warehouse Designer, Mapplet Designer, and Mapping Designer, I created new mapping designs.• Created the mappings in accordance with technical specifications by utilizing the necessary Transformations in the Informatica tool.• Developed intricate mappings that required Business Logic to be implemented to feed data into the staging area.• Developed mappings and sessions using Informatica Power Center for data loading and utilized Informatica reusability at different stages of development.• Using IICS Data Integration, an ETL process was designed, created, and put into use.• Made extensive use of performance tweaking strategies while utilizing IICS to import data into Azure Synapse.• Worked with different Informatica Transformations, such as Filter, Expression, Aggregate, Update Strategy, Normalizer, Joiner, Router, Sorter, and Union, to perform data manipulations.• Created scripts in Bash to retrieve log files from FTP servers and run Hive jobs to process and analyze them.• Developed the processes for transferring data to the data warehousing system from all systems.• Constructed the staging area's surroundings and filled it with information gathered from various sources.• Contributed to the development of ETL processes for transferring data from source to target systems by analyzing business process activities. Show less
- Sonata Software
  May 2016 - Feb 2017
  Jr Data Engineer
  I was a key player in the setup and management of the Hadoop Ecosystem on GCP as a Data Engineer at Target, overseeing the transfer of apps using Google Dataflow. I created store-level metrics and data pipelines in partnership with product teams, using tools like SQOOP, PySpark, and Airflow for automation and data processing.Responsibilities:• Used PySpark programming to implement and manage data transformations in the Azure environment.• Created T-SQL scripts to synchronize and migrate data between several database systems.• Enhanced Tableau data models to meet intricate business needs while maintaining data integrity and accuracy.• Developed unique Talend routines and components to address difficult integration problems and complicated data transformations.• Developed and refined intricate stored procedures, functions, and queries in PL/SQL.• Created and built Unix Shell scripts to automate Azure-based data integration workflows.• Used Spark Streaming for dividing streaming data into batches for input to the Spark engine.• Wrote Spark applications for data validation, cleansing, transformation, and custom aggregation.• Developed REST APIs using Python with Flask and Django for integration with various data sources.• Utilized Apache Spark with Python for developing Big Data Analytics and Machine learning applications. Show less
- PLZ Corp
  Apr 2017 - Feb 2019
  Big Data Engineer
  Plz corp specialized in development, manufacturing, packaging and distribution of a comprehensive private labeled products. I closely handled the data of the company, making sure that it is accurately gathered, processed, and stored. Creating and refining data pipelines to assist business administration and payment integrity is one of my duties. After that, the data is put into centralized data warehouses, which allow for thorough reporting and analytics for risk evaluation, premium computation, and client-specific insights.Responsibilities:• Designed and executed end-to-end data solutions on Cloudera, Hortonworks, MapR, Snowflake, and Apache Airflow, leveraging Hadoop, Hive, and PIG.• Skilled in developing and overseeing distributed data solutions and ETL pipelines using Big Data technologies including AWS, GCP, Azure Cloud services, Databricks platform, and Hadoop ecosystem components.• Using Terraform, developed and managed cloud infrastructure as code (IaC) to automate the provisioning of AWS services such as EC2, S3, and VPCs.• Proficiency with several ETL tools, such as Talend Open Studio for Big Data, in the areas of data migration, profiling, ingestion, cleansing, transformation, and export.• Proficient in the development and optimization of data solutions using SQL Server, MSBI, and Azure Cloud.• Expertise in using Azure Cosmos, Azure Synapse Analytics, Azure Data Factory, Azure Data Lake Storage, and Azure Analytical services.• Skilled in Star Join Schema/Snowflake modeling and possessing dimensional data modeling experience with tools such as ER/Studio, Erwin, and Sybase Power Designer.• Thorough understanding of the AWS platform and all its functionalities, such as Cloud Formation, Cloud Watch, Cloud Trail, EBS, VPC, RDS, and IAM.• Able to deal with CloudFront, CloudFormation, S3, Athena, SNS, SQS, Glue, RDS, DynamoDB, EC2 instances, ECS, EBeanstalk, Lambda, and Elastic load balancing. Show less
- Intetics
  Mar 2019 - Sept 2021
  GCP Data Engineer
  In Intetics our team project was to develop new application for MedForward a business partner of Intetics. MedForward allows pharmacies to drive more traffic to their stores mostly through the pharmacy finder a gateway to MTM. These pharmacies make money by performing MTM cases. Eventually MedForward will start offering products to pharmacies directly through the application training. Our team developed a new site for a partner company to use internally to manage their clients as well as developed external site for partner companie's clients.Responsibilities:• Using GCP technologies, end-to-end data pipelines for processing large amounts of bank transaction data have been successfully created and implemented.• Exhibited proficiency in SQL and Big Query, creating and refining intricate queries to retrieve valuable insights from terabytes of transactional data.• Comprehensive understanding of clinical workflows, patient data management, and healthcare processes.• Capable of maximizing the use of SSIS, SSAS, and SSRS to improve the accuracy and efficiency of data processing inside the project.• Developed intricate ETL procedures utilizing Python, Hadoop, and PySpark to convert unstructured transaction data into an organized format for subsequent analysis.• Practical knowledge in utilizing Spark to read data from various sources, such as files and RDBMS, and handle it through actions and transformations.• Diagnose issues, debug, and fine-tune SQL and PL/SQL code to maximize application performance.• Constructed Python DAGs within Apache Airflow to oversee complete data pipelines for diverse uses. Utilized Apache Spark-based analytics with Azure Databricks, enabling cooperative data science and engineering.• Experience in creating enterprise-level solutions with streaming frameworks (Apache Kafka, Spark Streaming, and Flink) and batch processing (using Apache Pig). Show less
- American Express
  Oct 2021 - Nov 2022
  Cloud Engineer
  My position as a cloud engineer at American Express is to work on cloud services and to use ETL tools like Informatica to ensure the smooth flow and transformation of financial data. I work in the field of financial planning and advisory services. I create and carry out data integration procedures that gather pertinent financial data from several sources, modify it in accordance with predetermined business standards and guidelines, and then load it into analytical or data warehouse systems.Responsibilities:• Designed and implemented end-to-end data solutions utilizing Hadoop, Hive, and PIG on various Big Data platforms including Cloudera, Hortonworks, MapR, Snowflake, and Apache Airflow.• Proficient in building and managing distributed data solutions and ETL pipelines leveraging Big Data technologies such as Hadoop ecosystem components, Databricks platform, AWS, GCP, and Azure Cloud services.• Developed and managed cloud infrastructure as Code (IaC) using Terraform for automating the provisioning of AWS resources like EC2, S3, and VPCs.• Expertise in Data Migration, Data Profiling, Data Ingestion, Data Cleansing, Transformation, and Data Export using multiple ETL tools like Talend Open Studio for Big Data.• Skilled in SQL Server, MSBI, and Azure Cloud for developing and optimizing data solutions.• Proficient in working with Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, and Azure Cosmos.• Experienced in Dimensional Data Modeling using tools like ER/Studio, Erwin, and Sybase Power Designer, with expertise in Star Join Schema/Snowflake modeling.• In-depth knowledge of the AWS platform and its features, including IAM, EC2, EBS, VPC, RDS, Cloud Watch, Cloud Trail, Cloud Formation, and more.• Skilled in working with EC2 instances, ECS, EBeanstalk, Lambda, Glue, RDS, DynamoDB, CloudFront, CloudFormation, S3, Athena, SNS, SQS, and Elastic load balancing (ELB). Show less
- OSF HealthCare
  Dec 2022 - now
  My primary responsibility as a Data Engineer in OSF Healthcare Solutions division is to develop and deploy data solutions that improve the effectiveness and efficiency of healthcare procedures. To ensure accurate data collection, processing, and storage, I work closely with healthcare data. Developing and refining data pipelines to facilitate patient-centered treatment, payment integrity, and healthcare administration are among my duties. By utilizing cutting-edge data modeling and integration strategies, I help ensure that information flows across healthcare systems seamlessly, which promotes better decision-making and operational efficacy.Responsibilities:• Designed and implemented scalable data ingestion pipelines using Azure Data Factory, ingesting data from various sources such as SQL databases, CSV files, and REST APIs.• Developed data processing workflows using Azure Databricks, leveraging Spark for distributed data processing and transformation tasks.• Ensured data quality and integrity by performing data validation, cleansing, and transformation operations using Azure Data Factory and Databricks.• Designed and implemented a cloud-based data warehouse solution using Snowflake on Azure, leveraging its scalability and performance capabilities.• Created and optimized Snowflake schemas, tables, and views to support efficient data storage and retrieval for analytics and reporting purposes.• Collaborated with data analysts and business stakeholders to understand their requirements and implemented appropriate data models and structures in Snowflake.• Developed and optimized Spark jobs to perform data transformations, aggregations, and machine learning tasks on big data sets.• Leveraged Azure Synapse Analytics to integrate big data processing and analytics capabilities, enabling seamless data exploration and insights generation.• Configured event-based triggers and scheduling mechanisms to automate data pipelines and workflows. Show less
  - Azure Data Engineer
    Dec 2022 - now
  - Azure Snowflake Data Engineer
    Dec 2022 - now
Licenses & Certifications
- Academy Accreditation - Generative AI Fundamentals
  Databricks
  Aug 2024
  View certificate
- Microsoft Certified: Azure Data Engineer Associate
  Microsoft
  Jun 2024
  View certificate
- Professional Data Engineer Certification
  Google
  May 2024
  View certificate