Jay Vala

Apprentice Automation Engineer

618 followers

Würzburg, Bavaria, Germany

Connect with Jay Vala to Send Message

Connect

Connect with Jay Vala to Send Message

Connect

Timeline
About me
I try to do science stuff with data.
Education
- SVIT/GTU
  2010 - 2014
  Engineer's Degree Electrical, Electronics and Communications Engineering 7.1
  Activities and Societies: Prakarsh 2013, 2014 and Vision 2014
- Bharatiya Vidya Bhavans Ambuja Vidya Niketan
  1995 - 2010
- Otto-von-Guericke University Magdeburg
  2015 - 2018
  Master of Science (M.S.) Digital Engineering
  MSc. Digital EngineeringMaster Thesis: "Classification of multilingual legal text using deep learning: Evaluation of general purpose resources for legal domain specific tasks."During my studies I took subjects like,Advanced Database Models, Databases, Distributed Data Management, Transaction Management, Data Management for Engineering applications, Information Retrieval, Organic Computing.
Experience
- Ambuja Cements Ltd
  Jan 2015 - Aug 2015
  Apprentice Automation Engineer
  Looking after uninterrupted automation process and resolving issues as and when arises
- Bayer
  Sept 2017 - Jan 2018
  Deep Learning Intern
  - The aim of the experiment was to prove the concept of usage of novel technologies (Deep Learning). - Experiment was divided into two subtasks, Classification and Extraction. - Documents originate from different sources and in different structures which was to be classified or extracted.Dataset: Text corpus of approx. 200,000 unstructured and structured pdf documents. Had scanned as well as text pdfs. Different documents from different time stamps for same case.First approach: - Analyze the dataset; divide it into structured and unstructured sets. - Focused on structured dataset, contained unstructured and structured fields - Hardcoding for structured fields, use neural network for unstructured fields. - Not enough unstructured data fields in to train neural network. Second approach: - Use every document in dataset instead of subset. - Idea: train neural network for on sentences to give probability of sentence being relevent, making it easy to process. Dataset Curation:- Extract information out of pdf. - Preprocess the data for every document (remove punctuations, stop words, numbers, etc.) using map and reduce. - Divide preprocessed text into sentences and labeled them according to the probability of them being relevent. - Trained neural network on labeled data. - Evaluation of results using, confusion matrix, precision, recall and f1-score Show less
- Legal Horizon AG
  Aug 2018 - May 2021
  Working Student
  Developing tagging system for the large legal corpus, helps in refining search results.Visualization of Topic models using PyLDAVis to better explain the search results.LHS_webapp, A RESTful API based web service for automation of tagging new documents in the database(SAS).Planning on implementing hierarchical deep learning for text classification and document tagging.Scrapping of Data from EUR-Lex website
- Scoutbee
  Jul 2019 - Jun 2022
  Entity Resolution pipeline to resolve and match company entities extracted/scraped/acquired from different sources.Used MLFlow manage different models.Managed different version and different combinations of data using DVC.Orchestration Entity Resolution pipeline using Kubeflow.Extraction of data from webpages, using NER.Evaluation of various tools/services available for annotation on HTML data.PoC for labeling HTML data using LabelStudio.Setting up data for LabelStudio for webpage data annotations.Identify positional features from HTML data can be used for data extraction.Along with the team implemented Active Learning loop for LabelStudio webpage data annotation. Show less Designed strategy for data acquisition, scraping and ingestion of the data periodically via pipeline for web sources.Participated, organized and communicated architectural and tool selection processes for data gathering and ingestion purposes.Together with a colleague formulated the Scraping pipeline using Scrapy cloud.Build spiders to scrape directories of company information.Used Apache Airflow to periodically trigger scrappers, transform data into a consistent format and dump it on ElasticSearch index.Experimented with different LSTMs based models to automate text extraction from webpages.Started with Entity Resolution pipeline to resolve and de-duplicate company profiles collected from various sources.Along with the team implemented Data Validation tools for internal enrichment platform. Show less
  - Data Scientist
    May 2021 - Jun 2022
  - Junior Data Scientist
    Jul 2019 - May 2021
- RTL Data
  Jun 2022 - now
  Data Scientist
Licenses & Certifications
- Deep Learning with TensorFlow
  IBM Cognitive Class
  Mar 2017
  View certificate
- Python 101 for DataScience
  IBM Cognitive Class
  Nov 2017
  View certificate
- Data Analysis with Python
  IBM Cognitive Class
  Nov 2017
  View certificate

Jay Vala

Timeline

About me

Education

SVIT/GTU

Bharatiya Vidya Bhavans Ambuja Vidya Niketan

Otto-von-Guericke University Magdeburg

Experience

Ambuja Cements Ltd

Bayer

Legal Horizon AG

Scoutbee

Data Scientist

Junior Data Scientist

RTL Data

Licenses & Certifications

Deep Learning with TensorFlow

Python 101 for DataScience

Data Analysis with Python

Languages

Recommendations

Diana esther castillo cosme

Jaime garriss

Dan curtis

Manikandan jagadeesan

Ovidiu nistor

Piyush chandra

Cory brown

Belal sedki

Shunya sano

Ayşenur yılmaz

Cordelia kwon

Novath lyaruu

Atharva rajhans

Mohamed aspak p m

Prerana jayakumar

Shraddha malu

Hiva javanbakht

Vijay kumar shirgire (vk)

Robert gutierrez

Maximiliano morel