Simple random sampling: is it actually simple?

No matter how hard you may try to forget your STAT101 course, you’ll likely tend to default to simple random sampling (SRS) as your knee jerk approach. It was, after all, an assumption you were told…

How to create a sampling plan for your data project

Imagine that you’re a data scientist who has been hired to estimate the average height of pine trees in the forest pictured below and describe the distribution. You’re responsible for the planning…

Demystifying the Parquet File Format

Have you ever used pd.read_csv() in pandas? Well, that command could have run ~50x faster if you had used parquet instead of CSV. In this post we will discuss apache parquet, an extremely efficient…

How to Use Deliberate Practice to Master the Most Challenging Concepts in Data Science

How to Use Deliberate Practice to Master the Most Challenging Concepts in Data Science. Using deliberate practice to study data science will set you apart from other data scientists.

Outliers, Leverage, Residuals, and Influential Observations

In data science, a common task is anomaly detection, i.e. understanding whether an observation is “unusual”. First of all, what does it mean to be unusual? In this article we are going to inspect…

What I’ve Learned After Using Git Daily for 6 Months

This article is for those who are starting to learn Git, aspire to learn Git, or have used Git but don’t use it actively. I used to be in the ‘used Git but don’t use it actively’ group. Learning Git…

How to Intuit the Prosecutor’s Fallacy (and Run Better Hypothesis Tests)

There is only a one in a million chance that the accused would match the DNA found at the crime scene. So the accused is guilty beyond reasonable doubt. Sound ok? It isn’t! Based on this evidence…

5 Very Practical Ways AI Can Help To Improve Your Company’s Productivity

Artificial intelligence, or AI, has become so ingrained in our daily lives that there is now an algorithm or AI model for almost anything, from children’s education to home improvement, health…

Machine learning: a friend or a foe for science?

Reproducibility is fundamental for scientific progress, but the increasing use of machine learning is affecting it. Why reproducibility is important? Why machine learning usage has a problematic side…

How To Print Coloured Text in The Terminal Using Python

Print coloured text, underline text, bold text Python. Printing colourful text on standard output (terminal) when printing or logging in Python.