Getting Started

Can sales of vanilla ice cream overtake chocolate?

Image for post
Image for post
Image by Nicky • 👉 PLEASE STAY SAFE 👈 from Pixabay

Table of contents:

  • Problem Statement
  • Data preparation
  • Wrong method 1 — Independent simulation (parametric)
  • Wrong method 2 — Independent simulation (non-parametric)
  • Method 1 — Multivariate distribution
  • Method 2— Copulas with marginal distributions
  • Method 3— Simulating historical combinations of sales growth
  • Method 4— Decorrelating store sales growth using PCA


Monte Carlo simulation is a great forecasting tool for sales, asset returns, project ROI, and more.

In a previous article, I provide a practical introduction of how monte Carlo simulations can be used in a business setting to predict a range of possible business outcomes and their associated probabilities.

In this…

Crossover/recombination oversampling adds novelty to a dataset and can score well on classification metrics vs. SMOTE and random oversampling

Image for post
Image for post
Image by liyuanalison at Pixabay

TL;DR — There are many ways to oversample imbalanced data, other than random oversampling, SMOTE, and its variants. In a classification dataset generated using scikit-learn’s make_classification default settings, samples generated using crossover operations outperform SMOTE and random oversampling on the most relevant metrics.

Table of contents

  • Introduction
  • Dataset preparation
  • Random oversampling and SMOTE
  • Crossover oversampling
  • Evaluation of performance metrics
  • Conclusion


Many of us have been in the situation of working on a predictive model with an imbalanced dataset.

The most popular approaches to handling the imbalance include:

  • Oversampling techniques
  • Undersampling techniques
  • Combinations of over and under…

Assess probabilities of various business outcomes

Image for post
Image for post
Photo by Mark de Jong on Unsplash

Monte Carlo simulation is a computational technique that can be used for a wide range of functions such as solving some of the more difficult mathematical problems as well as risk management.

We will go through 2 examples to demonstrate how Monte Carlo simulations can help you quantify risks in your next project or business decision.

Example 1: Sales Offer From a Wholesaler

Suppose you have an innovative product that you have been selling for the past year.

Model Interpretability

Tree-based ensembles and other popular algorithms often lead to counter-intuitive predictions when kept unchecked

Image for post
Image for post
Photo by Jose Vega from Pexels

Table of Contents:

  • Intro to model controllability
  • Preparing a sample dataset (House Sales in King County, USA)
  • Finding the model with the top cross-validation score (CatBoost)
  • Linear model’s outperformance in sanity checks
  • Conclusion


Gradient boosted trees have been widely used to win several competitions on Kaggle. It is no surprise that for most tabular datasets you are working with, you would likely find XGBoost or another implementation of boosted decision trees as the model with the best cross-validation score on your metric(s).

Question — How many times have you deployed a gradient boosted trees model with a supposedly good cross-validation score, but your…

Bassel Karami

Leading a data science team building retail analytics for shopping malls in the MENA region. MSc Econometrics | CFA, FRM, and CMA.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store