Vijaya Dasam

Data Engineer

518 followers

United States

Connect with Vijaya Dasam to Send Message

Connect

Connect with Vijaya Dasam to Send Message

Connect

Timeline
About me
Data Engineer
Education
- Loyola academy
  2006 - 2009
  Bachelor of Commerce - BCom Business-Commerce, Computers
Experience
- Oasis Infotech
  Oct 2009 - Apr 2011
  Data Engineer
  •Running Spark SQL operations on JSON, converting the data into a tabular structure with data frames, and storing and writing the data to Hive and HDFS.•Developing shell scripts for data ingestion and validation with different parameters, as well as writing custom shell scripts to invoke spark Employment.•Worked on complex SQL Queries, PL/SQL procedures and convert them to ETL tasks.•Created a risk-based machine learning model (logistic regress, random forest, SVM, etc.) to predict which customers are more likely to be delinquent based on historical performance data and rank order them.•Data was ingested from a variety of sources, including Kafka, Flume, and TCP sockets. •Data was processed using advanced algorithms expressed via high-level functions such as map, reduce, join, and window.•Used various DML and DDL commands for data retrieval and manipulation, such as Select, Insert, Update, Sub Queries, Inner Joins, Outer Joins, Union, Advanced SQL, and so on. Show less
- Experis
  Jun 2011 - Dec 2014
  Data Consultant
  •Drafted and optimized SQL scripts to assess the flow of online quotes into the database, ensuring data validation.•Developed and maintained SQL and PL/SQL stored procedures, triggers, partitions, primary keys, indexes, constraints, and views.•Created bucketed tables in Hive to optimize map side joins and job efficiency, including data partitioning for Hive queries.•Wrote MapReduce programs and Hive queries for data loading and processing within the Hadoop File System.•Configured and maintained Apache Hadoop clusters and tools like Hive, HBase, and Sqoop.•Utilized Sqoop to transfer data from Oracle databases into Hive tables.•Configured session and mapping parameters for adaptable runs with variable modifications.•Developed mappings using various transformations, including Source Qualifier, Aggregator, Expression, Filter, Router, Joiner, Stored Procedure, Lookup, Update Strategy, Sequence Generator, and Normalizer.•Deployed mapplets to streamline metadata and reduce development time.•Monitored workflows using the Workflow Builder and Monitor.•Managed Metadata Warehouse, establishing naming and warehouse standards for future applications.•Enhanced mapping performance by optimizing target bottlenecks and implementing pipeline partitioning. Show less
- Wolters Kluwer
  Jun 2016 - May 2018
  Data Engineer
  •Using Java and socket programming, created the application that would track the data from a server and takes regular backups to ensure data integrity and robustness of the application.•Got to experience work at the Research and Development Centre and learned about the new innovations that they are targeting in near futures and attended weekly expert lectures from industry professionals.•Grabbed the opportunity of contributing in form of a prototype that would track and backup files on secured server.•Automated the process to fetch the data which resulted in reduction of human intervention and therefore, no human errors.•Modifying existing Talend mappings to load to Snowflake DB.•Recreating existing AWS objects in Snowflake.•Partitioning, Dynamic Partitions, Buckets of Hive.•Implement Hive UDFs doe evaluation, filtering, loading, and storing of data.•Load data from a different source (database and files) into Hive using the ETL tool (Standard, Map Reduce, and Spark Job), monitor system health and logs, and respond to any warning or failure conditions. Show less
- UCare
  Aug 2018 - Oct 2020
  Data Engineer
  •Load data from a different source (database and files) into Hive using the ETL tool (Standard, Map Reduce, and Spark Job), monitor system health and logs, and respond to any warning or failure conditions.•Created and managed Source to Target mapping documents for all Facts and Dimension tables.•Used ETL methodologies and best practices to create ETL jobs. Followed and enhanced programming and naming standards•Primarily involved in Data Migration using SQL, SQL Azure, SSIS, PowerShell.•Migrating data from Oracle to Data Lake using Sqoop, Spark, and ETL Tool.•Development of automation of Kubernetes clusters via Terraform.•Mapping source files and generating Target files in multiple formats like XML, Excel, CSV, etc.•Used cloud shell SDK in GCP to configure the services Data Proc, Storage, and BigQuery. •Transform the data and reports retrieved from various sources and generate derived fields. •Reviewed the design and requirements documents with architects and business analysts to finalize the design. •Involved in Snowflake knowledge sharing sessions within the organization.•Migrate the code from development to the production environment using Nexus, Scheduling job to run in production through Control-M.•Experience in using features such as context variables, triggers, and connectors for Database and flat files. Show less
- AT&T
  Jan 2021 - Jun 2022
  Sr Data Engineer
  Developed and optimized multi-node Hadoop clusters, enhancing performance.Designed Big Data analytics platforms for processing customer interface preferences and comments using Hadoop.Utilized Databricks for advanced data analysis and machine learning model development, enabling real-time data processing and contributing to improved decision-making processes within the organizationCreated ETL pipelines with Oracle Data Integrator (ODI) to extract data from diverse sources.Contributed to data translation processes, moving client relational database data to the data warehouse.Served as a Hadoop consultant, leveraging technologies like MapReduce, Pig, and Hive.Utilized Spark RDDs and Python to convert Hive/SQL queries into Spark transformations.Conducted multiple proof-of-concepts in Python on the Yarn cluster, comparing Spark, Hive, and SQL performance.Led the migration of client data from AWS S3 to Snowflake DB.Employed Apache Kafka, Apache Storm, and Elastic search to build data platforms, pipelines, and storage systems.Implemented solutions for data ingestion using Hadoop, Kafka, and Hive.Managed the migration of SQL Server and Oracle objects to Snowflake.Conducted data querying using Spark SQL and migrated MapReduce programs to Spark transformations.Analyzed and processed S3 data using AWS Athena, Glue crawlers, and Glue jobs.Deployed fault-tolerant, high-availability advertiser applications using AWS services like EC2, SQS, SNS, IAM, S3, and DynamoDB.Created Hive-compatible table schemas on raw Data Lake data, partitioned by time and product dimensions, and analyzed using AWS Athena.Analyzed large data sets with MapReduce programs for aggregation and reporting Show less
- Health Care Service Corporation
  Aug 2022 - now
  Data Engineer
  Configured data loads from AWS S3 into Redshift using AWS Data Pipeline.Extracted, transformed, and loaded data from heterogeneous sources into AWS Redshift.Developed transformation processes and analyzed datasets using the Hortonworks Distribution for Hadoop ecosystem.Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).Developed and deployed large-scale data processing solutions using Databricks, leveraging its collaborative environment and optimized Spark performance.Developed data loading strategies and transformations using Hortonworks Distribution for Hadoop.Created PySpark applications for Spark SQL and data frame transformations, loading transformed data into Hive.Proficient in Snowflake Cloud DWH and other cloud technologies like AWS and Azure.Ingested large volumes of credit data from multiple providers into AWS S3, developing modular components for S3 connections.Developed Spark code in Python for EMR clusters.Loaded data into Snowflake and SQL Server tables, optimizing for parallelism (Multi Instances concept in DataStage).Designed AWS Glue pipelines for data ingestion, processing, and storage in AWS.Used Amazon EMR for processing Big Data across Hadoop Cluster on EC2 and S3.Managed Oracle databases for data integrity, availability, and performance.Developed Spark code in Python and Spark SQL for testing and data processing.Created Hive External tables and queried data using HQL.Developed ETL modules and workflows using PySpark and Spark SQL.Developed PySpark application for reporting tables with masking in Hive and MySQL DB.Supported Kafka integrations, including topics, producers, consumers, Schema Registry, Kafka Control Center, KSQL, and streaming applications.Built and maintained statistical routines using PC SAS macros, Enterprise Guide, PL/SQL, and self-written software. Show less
Licenses & Certifications
- Agile Requirements Foundations
  LinkedIn
  Jan 2023
  View certificate
- Leadership Foundations
  LinkedIn
  Aug 2023
  View certificate
- Communication Foundations
  LinkedIn
  Aug 2023
  View certificate
- Data Science Foundations: Fundamentals
  LinkedIn
  Aug 2023
  View certificate
- Communication within Teams
  LinkedIn
  Apr 2023
  View certificate