Mateo de Monasterio - MuttData.ai

About the role

# Anti-Fraud system for Cost-per-Install Advertising Spend

Built an end-to-end anti-fraud system for a global top-10 mobile-gaming unicorn. Once live, the system benefited the company with a 20% cost optimization of bad advertisement spend.

The AI model leverages hundreds of billions of data points of mobile-devices advertising events: impressions, clicks, installs and post-install events. The system learns what constitutes a normal time from click-to-install for a given ad-seller. Then, via probabilistic algorithms, it selects ad-sellers which are anomalous and outside of the norm, possibly serving click-spamming fraud. Once all selections are made, it prepares an automatic report for business analysts which are in charge of comparing selected ad-seller’s metrics to the average/normal ones.

The solution was developed with automatic deployment for model orchestration, model health and model validation with MLFlow. Data transformation was done via Databricks and the AI model was built with Python.

Software tools used were Python, Pandas/Koalas and Scipy for modeling, Airflow for orchestration.

Skills

Business Activities

Data Analysis

Environments

Big Data
Cloud Computing
Cloud Services

IT Infrastructure Expertise

Cloud Architecture

IT Infrastructure Products

Amazon
Amazon (AWS) EC2
Amazon AWS
AWS S3
Databricks
Hadoop
Kubernetes
PostgreSQL

Management Areas of Responsibilities

Fraud Prevention

Programming Languages & Frameworks

Flask (Python)
Python
SQL

Project Management Project Types

Data management
Data Migration

Software Development

AI / Machine Learning
Algorithms and Data Structures
Data Extraction
Data Mining
Data Science
Database Design

Software Development Tools

Git
GitLab

About the role

Built end-to-end Machine Learning pipeline for a YC-funded food delivery startup.

The model matches future food-delivery demand with delivery couriers supply. Model correctly forecasts all future week-hours with under 10% of absolute error.

Solution was developed with automatic deployment for model orchestration, model health and model validation with MLFlow.

Inluced API implementation to extract weather data for a demand and stock forecaster . Interacts with external APIs and AWS RDS.

Architecture relies on Python, Pandas, XgBoost and Scikit-learn for modeling, and Airflow for orchestration.

Skills

Environments

Cloud Computing

IT Infrastructure Expertise

Cloud Architecture

IT Infrastructure Products

Amazon (AWS) EC2
Amazon (AWS) RDS
Amazon AWS
Amazon AWS Auto Scaling
Amazon Relational Database Service
PostgreSQL

IT Security Expertise

Data security‎

Programming Languages & Frameworks

Flask (Python)
Python

Software Development

AI / Machine Learning
Algorithms and Data Structures
Data Extraction
Data Science

Software Development Tools

AWS Lambda
Git

About the role

# KPI Anomaly Detector

The team built a fully automated automatic TelCo forecaster, anomaly detector and root-causality analysis system for dozens of company-wide KPIs and/or business metrics.

Program runs fully automated on an hourly basis and triggers rules on which to act, aiding business analysts in their decisiones.

It uses five different Machine Learning algorithms to provide possible explanations for the described KPI anomaly. The system is data-engineered to process 250 billion yearly records of calls/traffic-data/user-actions as raw data.

Orchestrated with Airflow, integrated in Python with PostgresSQL + Oracle Exadata + Cloudera Hadoop as database, datawarehouses and data lakes.

UI was built by the client using SAP UI5.

Currently in use by the C&C Center to monitor, report and damage-control breaks on normal business operations.

# Telco Near-Streaming Data Processor for User Billing

The team built a massively parallel XML file processor for a near-online billing system (5m).

System would fetch millions of small flat files (kbs) or few huge (GBs) files across different servers to then organize/parse/group/aggregate them in a way which would match output schemas for 30+ tables of an Oracle Exadata database.

This would be connected to the company's ERP system for the constant billing process of customers.

Temporary storage via an internal Object Storage, developed in Cython and orchestrated via Airflow.

Skills

Business Activities

Data Analysis
Data Transformation

Environments

Big Data

IT Infrastructure Products

Amazon
Amazon (AWS) EC2
Apache Hadoop
AWS S3
Cloudera
Hadoop
Oracle
PostgreSQL

Management Consultancy Skills

Data and Performance Analysis

Programming Languages & Frameworks

Flask (Python)
Jinja (Python)
Oracle PL/SQL
Python
SQL

Software Development

AI / Machine Learning
Data Engineering
Data Extraction
Data Mining
Data Science
Database Design

About the role

This role has no description

Skills

About the role

This role has no description

Skills

About the role

Jampp specializes in boosting mobile sales for client's apps via RTB advertising. Mounted on a real-time Big Data and Machine Learning infrastructure that captures and analyze billions of data points per hour to maximize return on spend.

As a technical leader of a five people team, I was in charge of building heterogeneous ML systems which would improve relevant business metrics and problems. We addressed topics that ranged fraud, product recommendations, user segmentations and clusterings, via unsupervised and supervised Machine/Deep Learning algorithms.

Some of the projects deployed were:

Deep Clustering algorithms on mobile user sessions to create embeddings from user behaviour to user characterization as input for clustering algorithms based on w2v embeddings. Increase of +10% of CPI in all tests.

Generic Product recommendations tool for all mobile apps using the open-source Surprise lib and Airflow. Improved CPA by +35% and CTR by +5% in all product tests.

Researched the state of the art on Randomized Control Trials (also known as A/B tests) for the ad industry. Documented and evangelized proper RCT operations for sales and business teams for the current ad-RCT product. The solution provided statistical validity to +5% of uplift in advertising spend.

Researched, modeled and deployed an anti-fraud algorithm using complex statistical and clustering methods to tackle publishers engaged in click-spamming and click-injection. The solution helped account managers to save accounts that ran budgets in excess of one million dollars.

Researched, prototyped , developed and fully integrated a customer segmentation algorithm in Python. Tested with Random Forests, Logistic Regression and Naive Bayes models to score users in their propensity to make an in-app-purchase based on past behaviour. This project lead to a clear increase of the RTB revenue stream of the company.

Wrote, edited and reviewed the company’s technical blog posts on data science.

Implemented an internal data-science visualization web-app with Bokeh. This portal allows non-tech users to interact with Data's product with rich interactive visualizations.

Helped design and implement the ETL pipeline using Airflow for all products owned by the Data Science Team and using the infrastructure built over Hive, Presto, S3 and MySQL.

Promoted the in-company use of SQL querying, Python and Jupyter Notebooks to non-technical users by providing hands-on training and online slack discussions. This empowered users to build their own tailored analytics solutions to business-related problems, without solely relying on tech data-teams.

Skills

Business Activities

Data Analysis
Data Transformation

Environments

Big Data
Cloud Computing

IT Infrastructure Products

Amazon
Amazon (AWS) EC2
Amazon AWS
Amazon Relational Database Service
AWS S3
PostgreSQL

Management Consultancy Skills

Data and Performance Analysis

Programming Languages & Frameworks

Flask (Python)
Python
SQL

Software Development

AI / Machine Learning
Algorithms and Data Structures
Data Engineering
Data Extraction
Data Mining
Data Science
Database Design

Software Development Tools

AWS Lambda
Git
GitHub

Standards & Regulations

General Data Protection Regulation (GDPR)

About the role

Joint project that applied Big Data techniques using Python on billions of call records for the detection of possible Chagas-infected individuals in Argentina.

Accepted publication at ASONAM ACM/IEEE paper: 10.1109/ASONAM.2016.7752298

Newspaper article featuring the project ( ‘La Nación’):

http://www.lanacion.com.ar/1822271-mal-de-chagas-usan-llamadas-telefonicas-para-elaborar-un-mapa-de-riesgo

Skills

Business Activities

Data Analysis

Environments

Big Data
Healthcare

IT Infrastructure Products

Amazon AWS
Cloudera
PostgreSQL

Programming Languages & Frameworks

Python
SQL

Software Development

AI / Machine Learning
Data Extraction
Data Mining
Data Science
ETL (Extract, transform, load)

Software Development Tools

Git
GitHub

Hire top vetted consultants like this with fees starting at just 10%!

Make an enquiry about Mateo de Monasterio - MuttData.ai