Pravaliika M

Big Data & Hadoop Engineer

803 followers

Dallas-Fort Worth Metroplex

Connect with Pravaliika M to Send Message

Connect

Connect with Pravaliika M to Send Message

Connect

Timeline
About me
Sr. Data Engineer | 9+ Years of Expertise in Big Data, Spark Development, and Cloud Platforms (AWS & Azure) | Data Bricks | ETL | Hadoop | Big Query | Apache Spark |
Education
- Osmania University
  2010 - 2014
  Bachelor of Science - BS Computer Science
Experience
- MAISA SOLUTIONS, INC.
  Jun 2014 - Sept 2015
  Big Data & Hadoop Engineer
  The role involved comprehensive data engineering responsibilities, beginning with the utilization of Spark-SQL context to preprocess model data. A key contribution was designing an efficient HBase row key structure for storing Text and JSON in a sorted order, optimizing data retrieval. As a leader in ETL design, the responsibilities extended to identifying source systems, creating source-to-target relationships, and developing ETL design documents. Reporting tasks were executed in PySpark, Zeppelin, and Jupyter, showcasing proficiency in diverse analytics tools.Workflow management was streamlined through the installation and configuration of Airflow, complemented by custom workflow development in Python. The integration of Airflow for querying using Presto, Airflow, and AWS Athena demonstrated a versatile approach to data processing. Furthermore, the incorporation of Jenkins for continuous integration and GitHub for version control highlighted a commitment to robust development practices. The Azure environment witnessed the development and deployment of custom Hadoop applications, emphasizing adaptability across cloud platforms.The final aspect of the role involved mastering data integration tools, specifically SSIS and NiFi. The implementation of various SSIS tasks, including loop and sequence containers, script tasks, SQL tasks, and package configuration, showcased a comprehensive understanding of SQL Server data management. Additionally, NiFi procedures were developed for daily file collection from FTP locations, and the collected files were efficiently transferred to HDFS, highlighting a holistic approach to data movement and processing. The role showcased a strong combination of data engineering skills, workflow management, and proficiency in diverse data processing tools. Show less
- CouthIT
  Oct 2015 - Dec 2017
  AWS Data Engineer
  Proficient in data migration to cloud environments, I've extensively analyzed databases and objects for seamless transition to Azure Synapse. My expertise extends to evaluating platforms like Snowflake on Azure versus Azure Synapse, ensuring optimal solutions for diverse business needs. Leveraging Polybase and Azure Data Factory, I've successfully migrated on-premises Datawarehouses to Azure Synapse, enhancing scalability and performance.Collaborating with developers, DBAs, and support personnel, I've automated and elevated code deployment processes, ensuring smooth transition to production environments. Utilizing SSIS packages, I've facilitated data generation for reports and streamlined data export processes to various formats. Additionally, I've developed and optimized SQL queries and SSIS packages for efficient data fetching and processing, enhancing overall system performance.In delivering end-to-end business intelligence solutions, I've leveraged Microsoft technologies including Azure Data Lake, Databricks, and Azure SQL Data Warehouse. My expertise spans ELT/ETL processes using Azure Data Factory, transforming and loading data from various sources to Azure Synapse. I've designed and maintained reports in Power BI, ensuring insights-driven decision-making for sales and finance teams, while also administering workspaces and implementing robust security measures. Show less
- Ace Hardware Corporation
  Mar 2018 - Dec 2019
  Senior AWS Data Engineer
  Proficient in Spark and PySpark, I've executed a range of transformations and actions, including working with Parquet files, ORC, and Spark Streaming. Additionally, I've leveraged AWS Lambda functions and API Gateways to facilitate data submission, and adeptly constructed Cloud Formation templates for various AWS services, integrating them seamlessly into the system architecture. My expertise extends to building batch and streaming processing applications, alongside orchestrating workflows using Apache Airflow and Oozie, ensuring smooth execution of Hadoop jobs and large-scale data transformations.Furthermore, I've demonstrated proficiency in data management, handling tasks such as importing/exporting data using Sqoop, creating and analyzing Hive tables, and configuring EC2 instances on AWS for cluster establishment. In terms of deployment, I've implemented robust CI/CD solutions utilizing Git, Jenkins, and Docker, while also excelling in SQL scripting for efficient data querying and manipulation. My experience encompasses creating advanced Spark applications, integrating Kafka for stream processing, and optimizing performance through meticulous tuning of HIVE and map-reduce processes.Moreover, I possess comprehensive knowledge of AWS services like Lambda, Redshift, and RDS, with hands-on experience in migrating data between various databases and creating augmented Data Lakes. I've seamlessly converted Hive/SQL queries into RDD transformations, ensuring compatibility with Apache Spark. Additionally, I've proficiently crafted SQL queries for data extraction and transformation, facilitating the creation of insightful Tableau dashboards. My commitment to maintaining data integrity and compliance with enterprise standards is evident in my meticulous oversight of data governance and quality assurance processes, ensuring seamless collaboration across agile development teams and maintaining code base integrity using version control systems like SVN, Git, and Bitbucket. Show less
- Mayo Clinic
  Jan 2020 - Dec 2022
  Azure Data Engineer
  Proficient in data migration to cloud environments, I've extensively analyzed databases and objects for seamless transition to Azure Synapse. My expertise extends to evaluating platforms like Snowflake on Azure versus Azure Synapse, ensuring optimal solutions for diverse business needs. Leveraging Polybase and Azure Data Factory, I've successfully migrated on-premises Datawarehouses to Azure Synapse, enhancing scalability and performance.Collaborating with developers, DBAs, and support personnel, I've automated and elevated code deployment processes, ensuring smooth transition to production environments. Utilizing SSIS packages, I've facilitated data generation for reports and streamlined data export processes to various formats. Additionally, I've developed and optimized SQL queries and SSIS packages for efficient data fetching and processing, enhancing overall system performance.In delivering end-to-end business intelligence solutions, I've leveraged Microsoft technologies including Azure Data Lake, Databricks, and Azure SQL Data Warehouse. My expertise spans ELT/ETL processes using Azure Data Factory, transforming and loading data from various sources to Azure Synapse. I've designed and maintained reports in Power BI, ensuring insights-driven decision-making for sales and finance teams, while also administering workspaces and implementing robust security measures. Show less
- AgFirst Farm Credit Bank
  Jan 2023 - now
  Senior Azure Data Engineer
  Proficient in architecting and implementing data migration projects, I have hands-on experience with legacy system migrations such as Teradata to AWS Redshift and on-premises data warehouses to Azure Synapse. Leveraging tools like Polybase and Azure Data Factory, I ensure seamless data migration and integration across platforms. My expertise extends to developing ETL pipelines using Python, Snowflake's SnowSQL, and Azure Data Factory, guaranteeing efficient extraction, transformation, and loading of data.In addition, I have contributed significantly to the development of PySpark Data Frames in Azure Databricks, facilitating comprehensive data analysis and transformation. Playing a pivotal role in designing and deploying high-performance ETL pipelines, I utilize PySpark and Azure Data Factory to harness the capabilities of Azure services like Data Lake, Databricks, and SQL Data Warehouse. My proficiency extends to managing and monitoring Spark clusters, optimizing performance, and swiftly resolving any operational issues to ensure seamless data processing operations.Furthermore, my experience encompasses designing and maintaining highly scalable and fault-tolerant multi-tier environments across AWS and Azure using Terraform and CloudFormation. With a solid foundation in the Hadoop ecosystem, I am well-versed in various big data technologies like HDFS, MapReduce, Apache Kafka, and Spark. This breadth of experience positions me as a versatile data engineer capable of tackling complex data challenges and driving impactful solutions across diverse cloud environments. Show less
Licenses & Certifications
- Databricks
  Databricks
  Aug 2021