Stop Using Pandas to Read/Write Data — This Alternative is 7 Times Faster

I love Python’s Pandas library. It’s my preferred way to analyze, transform, and preprocess data. But boy is it slow when it comes to reading and saving data files. It’s a huge time waster…

3 Not So Common Yet Functional Python Libraries for Data Science

One of the reasons why Python dominates data science is the rich selection of libraries it offers to the users. The active Python community keeps maintaining and improving these libraries which helps…

The MLOps Engineer Role: A Gentle Introduction

I came across the term “MLOps engineer” a year back, when I teaching myself data science. I read many blog posts by data scientists who strongly suggested learning MLOps skills. They stated that it…

Data Pipelines with Apache Beam

Apache Beam is one of the latest projects from Apache, a consolidated programming model for expressing efficient data processing pipelines as highlighted on Beam’s main website [1]. Throughout this…

Well Log Data Outlier Detection With Machine Learning and Python

Outliers are anomalous points within a dataset. They are points that don’t fit within the normal or expected statistical distribution of the dataset and can occur for a variety of reasons such as…

Humidified Coffee for Faster Degassing and Better Espresso

In espresso, one of the main variables to a good shot is roast age. This is more challenging in espresso than other methods because the CO2 released during the shot slows down flow and extraction. By…

Optimize ML modeling using a timing decorator

Record the execution time of ML training using timing decorators and use it for productive data science and ML optimization. Optimize machine learning computation cost and accuracy.

Getting Started with Data Collection Using Twitter API v2 in Less than an Hour

Social media’s ubiquity has made various social media platforms more and more popular as a source of data. With this rise of social media as a data source, data collection using APIs is becoming a…

How to Choose the Right Colors for Data Visualizations

In essence, what is a graph made of? Shapes, lines and bars, probably. Some text elements, definitely. Figures, maybe. But most importantly: colors. Whether you choose a black and white design or a…

Intro to Probability Distributions with Python’s SciPy

A Python tutorial by example on: SciPy's probability distributions; and a distribution fitter that selects the best among 60 candidate distributions