Pooja B.

Java Developer

2000 followers

Centerton, Arkansas, United States

Connect with Pooja B. to Send Message

Connect

Connect with Pooja B. to Send Message

Connect

Timeline
About me
AWS Certified Solutions Architect-Associate | Google Cloud Certified-Associate Cloud Engineer | Senior Data Engineer @ Capital One
Education
- Alamuri Ratnamala Institute of Engineering and Technology
  -
  Bachelor's degree Computer Science
- Monroe College
  -
  Master's degree Computer Science
Experience
- EClinicalWorks
  Oct 2013 - Dec 2014
  Java Developer
  • Developed the use cases and class diagrams using Rational Rose/UML.• Used ORM in the persistence layer and implemented DAO’s to access data from with Oracle and MYSQL databases.• Storing the SOAP messages received in the JMS Queue of WebSphere MQ (MQ Series). • Developed Data access bean and developed EJBs that are used to access data from the database.• Used EJB to inject the services and their dependencies.• Wrote PL/SQL and SQL blocks for the application.• Used Core java Multi-Threading concepts for avoiding concurrent processes.• Used Log4j package for logging, ANT for automated deployment and Junit for Testing.• Providing daily development status, weekly status reports, and weekly development summary and defects report.• Implemented the project according to the Software Development Life Cycle (SDLC).• Implemented JDBC for mapping an object-oriented domain model to a traditional relational database.• Created Stored Procedures to manipulate the database and to apply the business logic according to the user’s specifications.• Developed the Generic Classes, which includes the frequently used functionality, so that it can be reusable.• Exception Management mechanism using Exception Handling Application Blocks to handle the exceptions.• Designed and developed user interfaces using JSP, Java script and HTML.• Involved in Database design and developing SQL Queries, stored procedures on MySQL.• Used CVS for maintaining the Source Code. Show less
- Rolta India Limited
  Jan 2015 - Jul 2016
  Hadoop and Spark Developer
  • Involved in Requirement Gathering to connect with BA.• Working Closely with BA & Client for creating technical Documents like High-Level Design and low-Level Design specifications.• Experienced on loading and transforming of large sets of structured data, semis structured data and unstructured data.• Imported data using Sqoop to load data from MySQL to HDFS on regular basis.• Developing RDDS to schedule various Hadoop Program.• Written SPARK SQL Queries for data analysis to meet the business requirements.• Experienced in defining job flows.• Cluster coordination services through Kafka and Zookeeper.• Serializing JSON data and storing the data into tables using Spark SQL.• Writing Shell scripts to automate the process flow.• Storing the extracted data into HDFS using Flume• Experienced in multiple file formats including XML, JSON, CSV and other compressed file formats• Experienced writing queries in Spark SQL using Scala• Communicated all issues and participated in weekly strategy meetings.• Collaborated with the infrastructure, network, database, application, to ensure data quality and availability.• Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs• Support/Troubleshoot hive programs running on the cluster and Involved in fixing issues arising out of duration testing.• Prepare daily and weekly project status report and share it with the client. Show less
- Capital One
  Apr 2018 - Dec 2018
  Hadoop Developer
  • Developed Spark scripts by using Scala as per the requirement.• Analyzed large amounts of data sets to determine optimal way to aggregate and report on it. • Designed and implemented Incremental Imports into Hive tables.• Developed and written Apache PIG scripts and HIVE scripts to process the HDFS data. • Involved in defining job flows, managing and reviewing log files.• Involved in Unit testing and delivered Unit test plans and results documents using Junit and MR unit.• Supported MapReduce Programs those are running on the cluster.• Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, Hive, Oozie, Flume, Sqoop etc.• Configured deployed and maintained multi-node Dev and Test Kafka Clusters and implemented data ingestion and handling clusters in real time processing using Kafka.• Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD'• Developed Spark scripts by using Scala shell commands as per the requirement.• Imported Bulk Data into HBase Using MapReduce programs.• Perform analytics on Time Series Data exists in HBase using HBase API.• Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.• Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data. • Extracted the data from Teradata into HDFS/Databases/Dashboards using Spark Streaming.• Responsible for continuous monitoring and managing Elastic MapReduce cluster through AWS console. • Wrote multiple java programs to pull data from HBase.• Involved with File Processing using Pig Latin.• Involved in creating Hive tables, loading with data and writing Hive queries that will run internally in MapReduce way. Show less
- Walmart Global Tech
  Jan 2019 - May 2020
  Sr. Big Data Engineer/Data Engineer
  • Implemented Apache Airflow for authoring, scheduling and monitoring Data Pipelines• Designed several DAGs (Directed Acyclic Graph) for automating ETL pipelines• Performed Data Migration to GCP• Responsible for data services and data movement infrastructures• Experienced in ETL concepts, building ETL solutions and Data modeling.• Aggregated daily sales team updates to send report to executives and to organize jobs running on Spark clusters• Designed & build infrastructure for the Google Cloud environment from scratch• Experienced in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling and SCD (Slowly changing dimension)• Worked on confluence and Jira• Designed and implemented configurable data delivery pipeline for scheduled updates to customer facing data stores built with Python• Implementing the big data pipeline with real-time processing using Python, PySpark and Hadoop ecosystem (HDFS, Map Reduce, Hive, Pig, Scala, Sqoop).• Have predominantly worked on Google Cloud Platform GCP Services: Compute Engine for hosting Net App on IIS (app server), Cloud SQL PostgreSQL (SSS DB and Lightbox DB) - Database, Internal Load Balancer - Load balance application server endpoints, Http Load Balancer, Stack driver - Logging and Monitoring, VPC, Other shared services VPC, IAM, DNS, KMS.• Compiled data from various sources to perform complex analysis for actionable results• Measured Efficiency of Hadoop/Hive environment ensuring SLA is met• Developed a PySpark code for saving data into AVRO and Parquet format and building Hive tables on top of them.• Experience in creating and executing Data pipelines in GCP and AWS platforms.• Hands on experience in GCP, Big Query, GCS, cloud functions, Cloud dataflow, Pub/Sub, cloud shell, GSUTIL, ba command-line utilities, Data Proc. Implemented a Continuous Delivery pipeline with Docker, and Git Hub and AWS• Built performant, scalable ETL processes to load, cleanse and validate data Show less
- Bank of America
  Jun 2020 - Jul 2022
  Senior Big Data Engineer/Hadoop Developer
  • Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive. • Have been using NiFi for transferring data from source to destination and Responsible for handling batch as well as Real-time Spark jobs through NiFi.• Developed micro-services using Python scripts in Spark Data Frame API’s for the semantic layer.• Developed Spark scripts by using Scala as per the requirement.• Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data. • Implemented Big Data Analytics and Advanced Data Science techniques to identify trends, patterns, and discrepancies on petabytes of data by using Azure Databricks.• Trained in QlikView and Splunk Reporting and Dashboard.• Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.• Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark. • Have been involved in data extraction, data migration, data validation, data encryption, data decryption, and data replication from on-prem to GCP and Bi-directional replication.• Worked on creating data ingestion processes to maintain Global Data Lake on the GCP cloud and Big Query.• Built the complete data ingestion pipeline using NiFi which POST’s flow file through invoke HTTP processor to our Micro services hosted inside the Docker containers.• Used CloudFormation and Cloud Development Kit (CDK) to define infrastructure resources and provision AWS resources in a repeatable and automated manner, ensuring consistency and reliability across environments.• Built Streaming services for real time processing of 100,000 users using Java and Scala.• Lead migration of a legacy Data Warehouse from On-premises to AWS and Java/Spark. Show less
- Freddie Mac
  Aug 2022 - Nov 2023
  Senior Big Data Engineer
  • Implemented end-to-end complete ETL for Rep and Warrant project with agile methodology and responsible for risk failure.• The current project involves cloud migration from oracle(on-prem) to GCP (cloud). Developed and automated data migration• Developed Python scripts to load raw JSON files to derive the attribute values for corresponding tables. • Used Snowpark for extracting the data from source and loading it to Enterprise snowflake.• Developed Python scripts to parse embedded JSON files to derive attribute values and load in snowflake table.• Implemented PySpark logic to transform and process various formats of data like XLS, XLS, JSON, and TXT. • Built scripts to load PySpark processed files into Redshift DB and used diverse PySpark logics. • Created Hive Generic UDF's to process business logic that varies based on policy. • Moved Relational Data base data using Sqoop into Hive Dynamic partition tables using staging tables.• Designed and implemented complex workflows and state machines using AWS Step Functions, orchestrating distributed systems and coordinating tasks across AWS services.• Used ansible for application Deployment, Continuous Deployment, Automation.• Implemented event-driven workflows using AWS EventBridge, enabling seamless integration and communication between various services and systems within the AWS ecosystem.• Created Hive Generic UDF's to process business logic that varies based on policy. • Moved Relational Data base data using Sqoop into Hive Dynamic partition tables using staging tables.• Develop predictive analytic using Apache Spark APIs. • Analyzed and mined business data to identify patterns and correlations among the various data points in Splunk. Show less
- Capital One
  Nov 2023 - now
  Senior Data Engineer
  • Worked on enhancing the features for the circuit breaker functionality on the Overdraft (OD) UI using Python and Scala. Ensured seamless data integration and feature updates to improve user experience and system reliability.• Loaded and managed data in DynamoDB, ensuring high availability and performance for real-time data access and updates.• Utilized AWS Step Functions to orchestrate various processes, including inclusion, exclusion, and segmentation steps. This ensured efficient workflow management and automation of complex business logic.• Developed Scala-based projects to perform aggregation of various rules and check if user data falls under specific metrics. • This included Implementing aggregation logic to evaluate complex rule sets.• Ensuring scalability and performance of rule evaluations, Integrating with AWS services for data processing and storage.• Leveraged AWS Glue jobs for ETL processes and AWS Lambda for serverless compute to run real-time data processing tasks. This included:• Creating Glue jobs to transform and load data efficiently.• Using Lambda functions to trigger specific steps within workflows.• Ensuring seamless integration between Glue and other AWS services.• Enhanced the circuit breaker feature to dynamically handle system loads and prevent failures, ensuring robustness and reliability of the application.• Employed a comprehensive technology stack including Python, Scala, AWS Step Functions, DynamoDB, AWS Glue, and AWS Lambda to deliver high-quality, scalable solutions. Show less
Licenses & Certifications
- Associate Cloud Engineer
  Google Cloud
  Feb 2024
  View certificate
- AWS Certified Solutions Architect – Associate
  Amazon Web Services (AWS)
  Feb 2024
  View certificate