Gabriel Ducrocq

Gabriel Ducrocq

Data Scientist Research Intern

Followers of Gabriel Ducrocq914 followers
location of Gabriel DucrocqParis, Île-de-France, France

Connect with Gabriel Ducrocq to Send Message

Connect

Connect with Gabriel Ducrocq to Send Message

Connect
  • Timeline

  • About me

    I am a researcher in AI/statistics/machine learning applied to cryoEM reconstruction.

  • Education

    • University Paris-Est Marne la Vallée

      2014 - 2015
      Master's Degree Porbability and Statistics ( applied and theoritical) Master 2 ( fifth year of university studies in mathematics)

      Followed courses:-Big Data- Time Series- Simulations(Monte-Carlo Markov Chain methods, Optimization, Copulas)- Stochastic Processes- Non-parametric Statistics- Model Selection- Stochastic Calculus.The following courses involved an IT Project using R : Stochastic Processes, Simulations, Model selection, Non-parametric Statistics.

    • NA

      2013 - 2014
      Hypnosis/Hypnotherapy

      One year break practicing and studying hypnotherapy.Co-creation of Association France Hypnose.Giving lectures about Impromptu Hypnosis in the association.Association's website:http://afh-hypnose.com/

    • University Paris-Est Marne la Vallée

      2012 - 2013
      Master's Degree Pure and applied Mathematics Master 1 (fourth year of university studies in mathematics)

      Followed courses:- Probability theory- Parametric Statistics- Stochastic Processes- Numerical Analysis applied to Partial Differential Equations Functional Analysis- Algebra (Galois Theory)- Distributions and Partial Differential Equations- Final Paper.Cum laud distinction.

    • University of Lille1

      2009 - 2012
      Pure and Applied Mathematics Licence (three years studying mathematics at university)

      Followed courses: -Parametric Statistics- Probability- Integration Theory- Numerical Analysis- Algebra- Topology- Differential Calculus- Graph Theory- Complex Analysis.Several courses involved in IT project using Ocaml, Mapple, Scilab, Mapple.

    • Ensae ParisTech

      2015 - 2016
      Specialized master (Sixth year of study) Data Science/Big Data

      Followed courses:- Machine Learning and Data-Mining-Data base and web-Computational Statistic (Monte Carlo Markov Chain)-Econometrics of marketing-Statistical analysis of network data-Tools for analysis of massive data base-Hadoop-Bootstrapping and resampling-Bayesian statistics

  • Experience

    • Laboratoire d'Analyse et de Mathématiques Appliquées

      May 2015 - Oct 2015
      Data Scientist Research Intern

      As an intern, a developped a two step method of supervised classification of Stochastic Differential Equations (SDEs) using the Bayes' classifier:- First we use a maximum likelihood estimate in order to estimate the parameters of the SDEs-Second, using the estimates, we build an approximation of the bayes function and we decide based on this.The paper is available on my github:https://github.com/Gabriel-Ducrocq/Final_Paper/blob/master/Final_paper.pdf

    • Cheerz.com

      Jun 2016 - Dec 2016
      Data Scientist Intern

      During my 6 months internship at Cheerz, I did many things:I tried to identify opportunities to increase the conversion on the website and the app. In order to do this, I worked with various sources of data:- From the company's own databases- Using the Google Analytics API (tracking data)I also built an algorithm gathering the accounts of potentials "influencers" on Instagram - people potentially interested in the products of Cheerz, with enough followers - using keywords hashtags and the Instagram's API.I was in charge to run analysis on the customer's data and to make dashboards to support the marketing department.Finally, I implemented an A/B testing tool using a Bayesian framework instead of the usual frequentist approcach. It was designed to avoid the bad consequences of peeking/early-stopping . Show less

    • La Javaness

      Mar 2017 - Sept 2017
      Research And Development Data Scientist

      As a R&D Data Scientist, I was in charge of developping machine learning models responding to the business needs of the clients:- Natural Language Processing and development of an API enabling automatic email customer service.- Natural Language Processing with Deep Learning methods for postal adresses extraction from emails.- Maintenance of a Spark ML model designed to target the right customer and the right time for a phone call.- Development of a pricing model for discount offers during real-time negociation.Technologies:- Python (pandas, scikit-learn, nltk)- Tensorflow- Spark- Javascript Show less

    • Yubo

      Jul 2018 - Sept 2018
      Data Scientist

      Natural language processing: topic emergence in live streams on the applicationDetection of spammers profiles on the application, using DataFlow, Google BigTable and MongoDB.

    • ENSAE Paris

      Oct 2018 - May 2022
      PHD Student

      PhD in Bayesian/computational statistics with an application to the study of the Cosmic Microwave background (CMB)Thanks to a cosmological model, we can establish a statistical model which, given the cosmological parameters (dark matter quantity, dark energy quantity, Hubble constant etc...) generates the CMB.Taking a Bayesian stance and setting a prior on these cosmological parameters, the aim of my thesis is to sample from the posterior distribution, given the observed CMB signal.This is a difficult problem, since the CMB signal is roughly 10^6 dimensional. Most of the algorithms require the inversion of a 10^6x10^6 dense dimensional matrix.I chose to improve upon the Gibbs sampler used in that field so far. I improved its performances by a factor 10 to 100 depending on the components, making this asymptotically exact method actually useful for the practitioner. I published a paper in Physical Review D:https://doi.org/10.1103/PhysRevD.105.103501I also developed the Cube method: a method to compress the output of MCMC algorithm using a geometrical sampling survey. I published a paper in Entropy:https://doi.org/10.3390/e23081017My PhD developed my ability to work at the intersection of cutting edge statistical concept and efficient code writing. In addition, I did my computations on a multi-CPUs/multi-GPUs environment.Implemented all my ideas in python and cython, using numba for efficiency.Since My project was multi-disciplinary, I am now comfortable in discussing research ideas and statistics with people having a very different scientific backgrounds. Show less

    • Linköping University

      May 2022 - now
      Postdoctoral Researcher

      I am applying deep learning to biology. More precisely, we tackle the problem of conformational heterogeneity of proteins.We collect very noisy images of copies of the same protein in different shapes (conformations), and we want to recover the distribution of theses conformations. I have done two things:1/ I modified and used Alphafold to sample more conformations and take its own custom input. 2/ I used generative modelling (a variational auto-encoder structure) to learn the distribution of the deformations of an Alphafold output to fit it into the different images.I used python, PyTorch, and a multi-GPUs environment.Since I am working with biologists, I am comfortable discussing research ideas and communicating with people having a very different scientific background. See our project page:https://gabriel-ducrocq.github.io/cryosphere.github.io/ Show less

  • Licenses & Certifications

    • TOEFL

      ETS
      Jan 2018
    • Genes and the Human Condition (From Behavior to Biotechnology)

      Coursera Course Certificates
      Feb 2016
      View certificate certificate
    • Python for Genomic Data Science

      Coursera Course Certificates
      Feb 2016
      View certificate certificate
    • Introduction to Genomic Technologies

      Coursera Course Certificates
      Mar 2016
      View certificate certificate