No matter how hard you may try to forget your STAT101 course, you’ll likely tend to default to simple random sampling (SRS) as your knee jerk approach. It was, after all, an assumption you were told…
Imagine that you’re a data scientist who has been hired to estimate the average height of pine trees in the forest pictured below and describe the distribution. You’re responsible for the planning…
Have you ever used pd.read_csv() in pandas? Well, that command could have run ~50x faster if you had used parquet instead of CSV. In this post we will discuss apache parquet, an extremely efficient…
How to Use Deliberate Practice to Master the Most Challenging Concepts in Data Science. Using deliberate practice to study data science will set you apart from other data scientists.
In data science, a common task is anomaly detection, i.e. understanding whether an observation is “unusual”. First of all, what does it mean to be unusual? In this article we are going to inspect…
This article is for those who are starting to learn Git, aspire to learn Git, or have used Git but don’t use it actively. I used to be in the ‘used Git but don’t use it actively’ group. Learning Git…
There is only a one in a million chance that the accused would match the DNA found at the crime scene. So the accused is guilty beyond reasonable doubt. Sound ok? It isn’t! Based on this evidence…
Artificial intelligence, or AI, has become so ingrained in our daily lives that there is now an algorithm or AI model for almost anything, from children’s education to home improvement, health…
Reproducibility is fundamental for scientific progress, but the increasing use of machine learning is affecting it. Why reproducibility is important? Why machine learning usage has a problematic side…
Print coloured text, underline text, bold text Python. Printing colourful text on standard output (terminal) when printing or logging in Python.