Jay Vala

Jay Vala

Apprentice Automation Engineer

Followers of Jay Vala618 followers
location of Jay ValaWürzburg, Bavaria, Germany

Connect with Jay Vala to Send Message

Connect

Connect with Jay Vala to Send Message

Connect
  • Timeline

  • About me

    I try to do science stuff with data.

  • Education

    • SVIT/GTU

      2010 - 2014
      Engineer's Degree Electrical, Electronics and Communications Engineering 7.1

      Activities and Societies: Prakarsh 2013, 2014 and Vision 2014

    • Bharatiya Vidya Bhavans Ambuja Vidya Niketan

      1995 - 2010
    • Otto-von-Guericke University Magdeburg

      2015 - 2018
      Master of Science (M.S.) Digital Engineering

      MSc. Digital EngineeringMaster Thesis: "Classification of multilingual legal text using deep learning: Evaluation of general purpose resources for legal domain specific tasks."During my studies I took subjects like,Advanced Database Models, Databases, Distributed Data Management, Transaction Management, Data Management for Engineering applications, Information Retrieval, Organic Computing.

  • Experience

    • Ambuja Cements Ltd

      Jan 2015 - Aug 2015
      Apprentice Automation Engineer

      Looking after uninterrupted automation process and resolving issues as and when arises

    • Bayer

      Sept 2017 - Jan 2018
      Deep Learning Intern

      - The aim of the experiment was to prove the concept of usage of novel technologies (Deep Learning). - Experiment was divided into two subtasks, Classification and Extraction. - Documents originate from different sources and in different structures which was to be classified or extracted.Dataset: Text corpus of approx. 200,000 unstructured and structured pdf documents. Had scanned as well as text pdfs. Different documents from different time stamps for same case.First approach: - Analyze the dataset; divide it into structured and unstructured sets. - Focused on structured dataset, contained unstructured and structured fields - Hardcoding for structured fields, use neural network for unstructured fields. - Not enough unstructured data fields in to train neural network. Second approach: - Use every document in dataset instead of subset. - Idea: train neural network for on sentences to give probability of sentence being relevent, making it easy to process. Dataset Curation:- Extract information out of pdf. - Preprocess the data for every document (remove punctuations, stop words, numbers, etc.) using map and reduce. - Divide preprocessed text into sentences and labeled them according to the probability of them being relevent. - Trained neural network on labeled data. - Evaluation of results using, confusion matrix, precision, recall and f1-score Show less

    • Legal Horizon AG

      Aug 2018 - May 2021
      Working Student

      Developing tagging system for the large legal corpus, helps in refining search results.Visualization of Topic models using PyLDAVis to better explain the search results.LHS_webapp, A RESTful API based web service for automation of tagging new documents in the database(SAS).Planning on implementing hierarchical deep learning for text classification and document tagging.Scrapping of Data from EUR-Lex website

    • Scoutbee

      Jul 2019 - Jun 2022

      Entity Resolution pipeline to resolve and match company entities extracted/scraped/acquired from different sources.Used MLFlow manage different models.Managed different version and different combinations of data using DVC.Orchestration Entity Resolution pipeline using Kubeflow.Extraction of data from webpages, using NER.Evaluation of various tools/services available for annotation on HTML data.PoC for labeling HTML data using LabelStudio.Setting up data for LabelStudio for webpage data annotations.Identify positional features from HTML data can be used for data extraction.Along with the team implemented Active Learning loop for LabelStudio webpage data annotation. Show less Designed strategy for data acquisition, scraping and ingestion of the data periodically via pipeline for web sources.Participated, organized and communicated architectural and tool selection processes for data gathering and ingestion purposes.Together with a colleague formulated the Scraping pipeline using Scrapy cloud.Build spiders to scrape directories of company information.Used Apache Airflow to periodically trigger scrappers, transform data into a consistent format and dump it on ElasticSearch index.Experimented with different LSTMs based models to automate text extraction from webpages.Started with Entity Resolution pipeline to resolve and de-duplicate company profiles collected from various sources.Along with the team implemented Data Validation tools for internal enrichment platform. Show less

      • Data Scientist

        May 2021 - Jun 2022
      • Junior Data Scientist

        Jul 2019 - May 2021
    • RTL Data

      Jun 2022 - now
      Data Scientist
  • Licenses & Certifications