Hire top vetted consultants like this with fees starting at just 10%!

Arrange a free consultation with one of our friendly team, with no obligation whatsoever

Mateo de Monasterio - MuttData.ai
tryhabitat.com, America Movil, Wildlife Studios
Lead Data Scientist
Buenos Aires, Argentina
Contract and ad-hoc work
Remote work
Available now
Lead Data Scientist
Wildlife Studios: IT Security, IT Infrastructure, Service Delivery, Software Development
Gaming
Show lessShow more

May 2019 - Present, Sao Paulo, Brazil

About the role

# Anti-Fraud system for Cost-per-Install Advertising Spend

Built an end-to-end anti-fraud system for a global top-10 mobile-gaming unicorn. Once live,  the system benefited the company with a 20% cost optimization of bad advertisement spend.

The AI model leverages hundreds of billions of data points of mobile-devices advertising events: impressions, clicks, installs and post-install events. The system learns what constitutes a normal time from click-to-install for a given ad-seller. Then, via probabilistic algorithms, it selects ad-sellers which are anomalous and outside of the norm, possibly serving click-spamming fraud. Once all selections are made, it prepares an automatic report for business analysts which are in charge of comparing selected ad-seller’s metrics to the average/normal ones. 

The solution was developed with automatic deployment for model orchestration, model health and model validation with MLFlow. Data transformation was done via Databricks and the AI model was built with Python.

Software tools used were Python, Pandas/Koalas and Scipy for modeling, Airflow for orchestration.

Skills
Business Activities
  • Data Analysis
Environments
  • Big Data
  • Cloud Computing
  • Cloud Services
IT Infrastructure Expertise
  • Cloud Architecture
IT Infrastructure Products
  • Amazon
  • Amazon (AWS) EC2
  • Amazon AWS
  • AWS S3
  • Databricks
  • Hadoop
  • Kubernetes
  • PostgreSQL
Management Areas of Responsibilities
  • Fraud Prevention
Programming Languages & Frameworks
  • Flask (Python)
  • Python
  • SQL
Project Management Project Types
  • Data management
  • Data Migration
Software Development
  • AI / Machine Learning
  • Algorithms and Data Structures
  • Data Extraction
  • Data Mining
  • Data Science
  • Database Design
Software Development Tools
  • Git
  • GitLab
Lead Data Scientist
tryhabitat.com: Software Development, IT Security, Service Delivery
Food & Drink company
Show lessShow more

Mar 2019 - Present, Philadelphia, United States

About the role

Built end-to-end Machine Learning pipeline for a YC-funded food delivery startup.

The model matches future food-delivery demand with delivery couriers supply. Model correctly forecasts all future week-hours with under 10% of absolute error.

Solution was developed with automatic deployment for model orchestration, model health and model validation with MLFlow. 

Inluced API implementation to extract weather data for a demand and stock forecaster . Interacts with external APIs and AWS RDS. 

Architecture relies on Python, Pandas, XgBoost and Scikit-learn for modeling, and Airflow for orchestration.

Skills
Environments
  • Cloud Computing
IT Infrastructure Expertise
  • Cloud Architecture
IT Infrastructure Products
  • Amazon (AWS) EC2
  • Amazon (AWS) RDS
  • Amazon AWS
  • Amazon AWS Auto Scaling
  • Amazon Relational Database Service
  • PostgreSQL
IT Security Expertise
  • Data security‎
Programming Languages & Frameworks
  • Flask (Python)
  • Python
Software Development
  • AI / Machine Learning
  • Algorithms and Data Structures
  • Data Extraction
  • Data Science
Software Development Tools
  • AWS Lambda
  • Git
Lead Data Scientist
America Movil: Software Development, IT Security, IT Infrastructure, Service Delivery
Telecommunications company - Service Provider, ISP, Mobile
Show lessShow more

Nov 2017 - Present, Buenos Aires, ARgentina

About the role

# KPI Anomaly Detector

The team built a fully automated automatic TelCo forecaster, anomaly detector and root-causality analysis system for dozens of company-wide KPIs and/or business metrics. 

Program runs fully automated on an hourly basis and triggers rules on which to act, aiding business analysts in their decisiones.

It uses five different Machine Learning algorithms to provide possible explanations for the described KPI anomaly. The system is data-engineered to process 250 billion yearly records of calls/traffic-data/user-actions as raw data.  

Orchestrated with Airflow, integrated in Python with PostgresSQL + Oracle Exadata + Cloudera Hadoop as database, datawarehouses and data lakes. 

UI was built by the client using SAP UI5.

Currently in use by the C&C Center to monitor, report and damage-control breaks on normal business operations.

# Telco Near-Streaming Data Processor for User Billing

The team built a massively parallel XML file processor for a near-online  billing system (5m).

System would fetch millions of small flat files (kbs) or few huge (GBs) files across different servers to then organize/parse/group/aggregate them in a way which would match output schemas for 30+ tables of an Oracle Exadata database.

This would be connected to the company's ERP system for the constant billing process of customers.

Temporary storage via an internal Object Storage, developed in Cython and orchestrated via Airflow.

Skills
Business Activities
  • Data Analysis
  • Data Transformation
Environments
  • Big Data
IT Infrastructure Products
  • Amazon
  • Amazon (AWS) EC2
  • Apache Hadoop
  • AWS S3
  • Cloudera
  • Hadoop
  • Oracle
  • PostgreSQL
Management Consultancy Skills
  • Data and Performance Analysis
Programming Languages & Frameworks
  • Flask (Python)
  • Jinja (Python)
  • Oracle PL/SQL
  • Python
  • SQL
Software Development
  • AI / Machine Learning
  • Data Engineering
  • Data Extraction
  • Data Mining
  • Data Science
  • Database Design
Lead Data Scientist
AUSA: Software Development, IT Infrastructure, Service Delivery
Transport company - Airline, Automotive, Train, Marine
Show lessShow more

Sep 2018 - Mar 2019, Buenos Aires, Argentina

About the role

This role has no description

Skills
Lead Data Scientist
AGEA: Software Development
Media & Broadcasting company - TV, Music, Movies, Radio, Entertainment
Show lessShow more

Nov 2017 - Mar 2019, Buenos Aires, Argentina

About the role

This role has no description

Skills
Lead Data Scientist
Jampp: Software Development, IT Infrastructure
Marketing & Advertising company
Show lessShow more

Dec 2015 - Aug 2018, Buenos Aires, Argentina

About the role

Jampp specializes in boosting mobile sales for client's apps via RTB advertising. Mounted on a real-time Big Data and Machine Learning infrastructure that captures and analyze billions of data points per hour to maximize return on spend.

As a technical leader of a five people team, I was in charge of building heterogeneous ML systems which would improve relevant business metrics and problems. We addressed topics that ranged fraud, product recommendations, user segmentations and clusterings, via unsupervised and supervised Machine/Deep Learning algorithms. 

 

Some of the projects deployed were:

Deep Clustering algorithms on mobile user sessions to create embeddings from user behaviour to user characterization as input for clustering algorithms based on w2v embeddings. Increase of +10% of CPI in all tests.

Generic Product recommendations tool for all mobile apps using the open-source Surprise lib and Airflow. Improved CPA by +35% and CTR by +5% in all product tests.

Researched the state of the art on Randomized Control Trials (also known as A/B tests) for the ad industry. Documented and evangelized proper RCT operations for sales and business teams for the current ad-RCT product. The solution provided statistical validity to +5% of uplift in advertising spend.

Researched, modeled and deployed an anti-fraud algorithm using complex statistical and clustering methods to tackle publishers engaged in click-spamming and click-injection. The solution helped account managers to save accounts that ran budgets in excess of one million dollars. 

Researched, prototyped , developed and fully integrated a customer segmentation algorithm in Python. Tested with Random Forests, Logistic Regression and Naive Bayes models to score users in their propensity to make an in-app-purchase based on past behaviour. This project lead to a clear increase of the  RTB revenue stream of the company.

Wrote, edited and reviewed the company’s technical blog posts on data science.

Implemented an internal data-science visualization web-app with Bokeh. This portal allows non-tech users to interact with Data's product with rich interactive visualizations.

Helped design and implement the ETL pipeline using Airflow for all products owned by the Data Science Team and using the infrastructure built over  Hive, Presto, S3 and MySQL. 

Promoted the in-company use of SQL querying, Python and Jupyter Notebooks to non-technical users by providing hands-on training and online slack discussions. This empowered users to build their own tailored analytics solutions to business-related problems, without solely relying on tech data-teams.

Skills
Business Activities
  • Data Analysis
  • Data Transformation
Environments
  • Big Data
  • Cloud Computing
IT Infrastructure Products
  • Amazon
  • Amazon (AWS) EC2
  • Amazon AWS
  • Amazon Relational Database Service
  • AWS S3
  • PostgreSQL
Management Consultancy Skills
  • Data and Performance Analysis
Programming Languages & Frameworks
  • Flask (Python)
  • Python
  • SQL
Software Development
  • AI / Machine Learning
  • Algorithms and Data Structures
  • Data Engineering
  • Data Extraction
  • Data Mining
  • Data Science
  • Database Design
Software Development Tools
  • AWS Lambda
  • Git
  • GitHub
Standards & Regulations
  • General Data Protection Regulation (GDPR)
Junior Researcher
Mundo Sano Foundation: Other, Software Development
Healthcare company
Show lessShow more

Feb 2015 - Mar 2016, Buenos Aires, Argentina

About the role
Skills
Business Activities
  • Data Analysis
Environments
  • Big Data
  • Healthcare
IT Infrastructure Products
  • Amazon AWS
  • Cloudera
  • PostgreSQL
Programming Languages & Frameworks
  • Python
  • SQL
Software Development
  • AI / Machine Learning
  • Data Extraction
  • Data Mining
  • Data Science
  • ETL (Extract, transform, load)
Software Development Tools
  • Git
  • GitHub