Vinay Kumar Reddy Sure

Vinay Kumar Reddy Sure

Jr.Data Engineer

Followers of Vinay Kumar Reddy Sure871 followers
location of Vinay Kumar Reddy SureHouston, Texas, United States

Connect with Vinay Kumar Reddy Sure to Send Message

Connect

Connect with Vinay Kumar Reddy Sure to Send Message

Connect
  • Timeline

  • About me

    Data Engineer | GenAI & RAG Pipelines | LangChain, LLM | Data Marts and Data Mesh for ML | HIPAA-Compliant Pipelines| SageMaker, Vertex AI, Azure OpenAI | Airflow, Spark, Databricks,Python,Big Data

  • Education

    • Marri Educational Society's Marri Laxman Reddy Institute of Technology and Management

      -
      Bachelor of Technology - BTech Electrical, Electronics and Communications Engineering 3.6/4.0

      Completed multiple Lab experiments and design lab activity such as C Programming in freshman,MATLAB, Java programming and core courses like embedded systems, Singles and systems

    • Central Michigan University

      -
      Master of Science - MS Information Technology

      Activities and Societies: business data analytics and Project Management

  • Experience

    • Interactive Analytics Pvt Ltd

      Jan 2018 - Jan 2019
      Jr.Data Engineer

      Designed and supported scalable ETL pipelines using AWS Glue, Lambda, and Flask for high-frequency trading data. Worked on Redshift, S3, PostgreSQL, and Redshift to optimize storage and enable trading strategy analytics. Built dashboards in Tableau and QuickSight, automated workflows with Python, Hive, and PySpark on AWS EMR, and contributed to CI/CD pipelines with Jenkins and GitLab CI. Ensured SOX, and FINRA compliance with secure data handling practices and IAM-based access controls.

    • Unistring Tech Solutions Pvt. Ltd. (UTS)

      Jan 2019 - Aug 2022
      Data Engineer

      I developed a finance data pipeline using Python, SQL, and Spark to ingest data into an Azure Data Lake, improving data accuracy by 30%. By automating ingestion with Azure Data Factory, I ensured robust data flow and reduced processing time by 25% through event-triggered pipelines transferring data into Azure Synapse Analytics.I optimized SQL performance with dynamic scripts in Databricks and consolidated data from APIs within Databricks using Spark, which reduced anomalies by 20%. I managed an Azure Data Lake, utilizing SQL scripts for efficient data transformations, and applied Python libraries for time series analysis, enhancing budgeting accuracy by 15%.Additionally, I created Power BI dashboards integrated with Azure Synapse Analytics, improving financial tracking accuracy by 30%, and employed financial models in Python and Databricks to support data-driven decision-making. My work in batch and real-time analysis with SQL and Spark improved financial health management by 12% and utilized advanced visualizations to effectively present financial metrics. Show less

    • Central Michigan University

      Sept 2022 - Dec 2023
      Data Analyst

      I developed an ETL pipeline with Python, SQL, and Google Cloud Storage, consolidating customer engagement data from various marketing platforms, which increased data integration accuracy by 40%. Leveraging SQL, GCP BigQuery, and APIs, I integrated data from social media, email, and web analytics, reducing processing time by 25%. I implemented data cleaning techniques in Python and GCP Dataflow, enhancing engagement metrics quality by 25%, and built a data warehousing solution in GCP BigQuery to manage historical data efficiently.I applied data mining techniques on GCP to uncover patterns in customer behavior, boosting marketing ROI by 20%. Additionally, I created real-time dashboards using Tableau and BigQuery, improving responsiveness and strategic planning. Through exploratory analysis, advanced analytics, and visualization in Tableau, I enabled actionable insights for better marketing outcomes, increasing campaign effectiveness by 15%. Show less

    • ITech-Go

      Jul 2023 - Jul 2024
      Data Engineer

      I drive the integration and optimization of Epic EHR data for improved reporting and analytics. By implementing ETL processes, I've increased data accessibility by 40% and reduced report generation time by 30%. I develop data models to represent clinical and operational data, enhancing patient care metrics by 20%, and establish rigorous data quality checks, reducing errors by 25%. My responsibilities also include securing patient data with encryption and access controls to ensure HIPAA compliance.I optimize ETL workflows, reducing data latency by 35% for real-time insights and maintain a data warehousing infrastructure tailored for Epic EHR, increasing retrieval speeds by 25%. Additionally, I automate data processes using Informatica and Talend, improve operational efficiency, and deploy dashboards with Tableau and Power BI, enabling data-driven decision-making across clinical teams. By implementing data governance practices and collaborating with healthcare professionals, I enhance data quality and support data-driven strategies effectively. Show less

    • MD Anderson Cancer Center

      Jul 2024 - now
      AI/ML Data Engineer

      Architected scalable ETL workflows using Apache Airflow (MWAA) to automate ingestionand transformation of Epic EHR data (Clarity, Caboodle, Chronicles), reducing pipelinelatency by 40% and improving clinical data availability for analytics and reporting.• Implemented Medallion Architecture (Bronze/Silver/Gold layers) using AWS Glue and LakeFormation, improving schema evolution, auditability, and compliance with HIPAA.• Standardized healthcare data pipelines using Glue jobs, PySpark, reducing downstream dataanomalies by 30% and increasing ML readiness across predictive care modelsDeveloped GenAI-powered knowledge graphs from patient history using LangChain andembedding-based RAG pipelines, enabling LLMs to generate accurate structured insights forphysicians and care teams.• Monitored model performance and data drift using AI, integrated alerting with Slack,PagerDuty, and Prometheus for real-time ML observability and retraining triggers.Achievement:Designed and deployed Lan Chain-powered GenAI pipelines on AWS and Redshift, integratingvector databases (FAISS/Chroma) with RAG architecture to extract and structure patient history,diagnoses, and treatment plans from unstructured EHR Warehouse, enabling real-time LLM-basedclinical decision support while ensuring HIPAA-compliant context handling. Show less

  • Licenses & Certifications