Curriculum Vitae

David Adrián Cañones Castellano

🔬 Data Scientist | 🤖 Machine Learning Engineer | 💹 MBA


Summary

Data Scientist & Machine Learning Engineer with 4+ years of experience helping companies and institutions solving complex problems using data analytics solutions. I have successfully completed projects ranging from predictive modeling to data pipelines designing for both enterprises and startups.

I have a broad experience using Python Data Science toolkit (pandas, scikit-learn, Tensorflow, Keras, PySpark, etc.)and work with large amounts of data ($\sim$GB, TB) in a daily basis, so I am also experienced with Big Data and distributed computing tools (Hadoop Ecosystem: Spark, Hive, Impala, etc.) as well as parallel computing ones (GPU accelerated computing).

I am creator of products like The Moderator Guru, a Natural Language Processing tool that automatically spots and classifies offensive text content.

I Revitalized DataTau, one of the most important websites to share Data Science news, articles, and have discussions. Open sourced the site code on GitHub and maintain the project.

I am member and supporter of the startup community and Indie Hackers Community Ambassador for Madrid.

Download this CV (upd. june 2019)
Send me an email
Go to my Github


Experience

Senior Data Scientist, Pragsis Bidoop (7/2018 - Today):

  • Tasks:

    • Development of Machine Learning models using traditional (scikit-learn, XGBoost, lightGBM) and Deep Learning (TensorFlow, Keras) frameworks
    • Scaling Machine Learning models from prototyping to production using distributed and parallel computing (Spark, Dask, Celery)
    • Orchestrating data pipelines using Apache Airflow
    • Leveraging Python toolkit for Data Science to extract valuable information from Data and explain it through visualizations and reports using among other tools pandas, NumPy, Matplotlib, Plotly, Bokeh, Seaborn, etc.
    • Development of Computer Vision solutions using Google Edge TPUs, Nvidia GPUs, Tensorflow and OpenCV
  • Achievements:

    • Improved power production forecasting error in 10 percentage points for a cluster of 13 wind farms (about 1GW total managed power) located in Washington, USA, resulting in important savings for our client
    • Developed a Reinforcement Learning algorithm for Amazon Web Services DeepRacer League and became member of the winning team (Gold, Silver and Copper positions)
    • Developed a live tracking system for detecting people using Computer Vision techniques optimized for low consumption hardware requirements (Google Edge TPU, Raspberry Pi)

Data Scientist, Kernel Analytics (now BCG Gamma) (10/2017 - 7/2018):

  • Tasks:

    • Development of Machine Learning models using traditional (scikit-learn) and Big Data (Spark MLlib) frameworks
    • Designing custom KPIs based on customers needs and data availability
    • Designing data pipelines able to ingest data from heterogeneous inputs into Hadoop Distributed File System (HDFS)
    • Orchestrating data pipeline executions using Apache Airflow
    • Extraction of insights from customers data and creation of meaningful visualizations using Matplotlib, ggplot2, Seaborn, Plotly and Bokeh
    • Creating interactive dashboards using Plotly Dash and Microsoft PowerBI
  • Achievements:

    • Developed a Customer Experience Management framework for a successful Mobile Operator. Designed pipeline from ingesting 3G/4G antennas data to creating a model to relate Customer Experience with Churn and Complaints, resulting into our client being able to monitor its mobile network infrastructure impact in Customer Experience
    • Developed a predictive model for a well known Mobile Operator able to predict users complaints based on consumption patterns and user personal profile, resulting into our client being able to automate part of support process

Maintainer, DataTau (6/2019 - Today):

  • Tasks:

    • Web development with Django (Python)
    • Management and maintenance of Open Source project (issues, roadmap, pull requests, etc.)
    • Promoting the site in tech forums and communities
    • Hosting management in personal datacenter
  • Achievements:

    • Development of a fully functional clone of Hacker News in a record time (3 days)
    • Viral relaunching of the site, reaching Hacker News front page and a very relevant position in Reddit
    • Hosting in self-owned infrastructure, supporting major traffic spikes
    • Revitalizing DataTau community and creation of open source project that ensures continuity and independence of DataTau forever

Founder, The Moderator Guru (1/2019 - Today):

  • Tasks:

    • Development of a fast NLP engine capable of detecting and classifying offensive text content
    • Development of the web app around the NLP model, with Bootstrap and jQuery front-end and back-end based on Django
    • Management of infrastructure based on Linux servers and networking
  • Achievements:

    • Developed a NLP based product working on production with an available REST API able to serve thousands of requests per minute
    • Full project is bootstrapped and self-hosted in on- premise home infrastructure, allowing me to become Indie Hackers Community Ambassador for Madrid

Junior Data Scientist, Grupo Servinform (2/2015 - 10/2017):

  • Tasks:

    • Designing data pipelines able to automatically clean and ingest data from heterogeneous inputs into relational databases using pandas and NumPy
    • Developing Natural Language Processing models using NLTK
    • Developing a data product (web app) to make data exploration easier for pharmaceutical and healthcare researchers allowing them to establish complex relationships between data from different sources
    • Identifying potential automation opportunities internally and for our clients and validating technical feasibility
  • Achievements:

    • Developed a Natural Language Processing Model able to interpret user queries (written as natural language) and translate into queries to our database
    • Developed a framework able to automate parts of hand-made back-office processes and integrate seamlessly with human workers, resulting in a new business line for our company and the ability to tackle projects that otherwise would be discarded
    • Collaborated with my manager in the writing of a proposal for H2020 R&D program with an excellent result and public research funds granted to our company

Education

EOI Business School (9/2014 - 9/2015):

MBA, Corporate Finance

Universidad de Sevilla (9/2007 - 9/2014):

MS Industrial Engineering, Energy


Honors & Awards

AWS DeepRacer League Madrid, 3rd position (5/2019)

Machine Learning competition organized by AWS which consisted on developing a Reinforcement Learning model for an autonomous car. I got the 3rd position in Spanish competition and was member of the team that made the top 3 positions (Gold, Silver and Copper)


Courses & Certifications


Technical Stack

Machine Learning scikit-learn, XGBoost, lightGBM, H2O, MLlib
Deep Learning TensorFlow, Keras, PyTorch
Computer Vision OpenCV
NLP NLTK, spaCy
In Memory pandas, NumPy, Apache Arrow
Relational PostgreSQL, SQLite, Oracle
Big Data Apache Spark, Hive, Impala
Cloud Computing AWS (Amazon Web Services)
Orchestration Apache Airflow, Apache Oozie
Web Development    HTML, CSS, Javascript, Django, Flask
Languages Python, R, SQL