December Edition: 2022 Highlights

Last December, as we were looking ahead towards 2022, we wished our entire community a year full of learning, discovery, and calmer times. The world didn’t quite deliver on that last item, but we can… 15 hours ago

River: Online Machine Learning in Python

It is common for data practitioners to use batch learning to learn from data. Batch learning is the training of ML models in batch. An ML pipeline with batch learning typically includes: 11 hours ago

Image color Segmentation by K-means clustering algorithm

Color segmentation is a technique used in computer vision to identify and distinguish different objects or regions in an image based on their colors. Clustering algorithms can automatically group… 21 hours ago

How to Calculate Medians with Grouping in MySQL

Calculating the median of an array of data is pretty straightforward in any programming language, even in Excel, where a built-in or third-party median function can be used directly. However, in… 21 hours ago

The Case Against the Pie Chart

The visualization of quantitative data through charts and graphs has the purpose of making the data easier to understand and to derive valuable insights from it. Pie charts, however, tend to do the… 11 hours ago

Large-Scale Knowledge Graph Completion on Graphcore IPUs

How Graphcore researchers developed BESS (Balanced Entity Sampling and Sharing)

Camera Radial Distortion Compensation with Gradient Descent

Consumer-grade cameras and lenses are cheap and ubiquitous. Unfortunately, unlike their industrial counterparts, they were not designed to be used as tools for precise measurements in computer vision… 21 hours ago

Understanding Simpson’s Paradox with a Machine Learning Problem Framing

Simpson’s paradox is a well-known statistical paradox. Like all paradoxes (by definition), even if we know the answer, it doesn’t seem intuitive. In the case of Simpson’s Paradox, the Machine… 11 hours ago

Paper Review Monolith: Towards Better Recommendation Systems

Review of a recent work of Bytedance the parent company of Tiktok that highlights a recommendation engine that leverages online training, embeddings, hashes

Dealing with Date and Time in Pandas DataFrames

One of the common tasks you often need to perform with Pandas DataFrames is that of manipulating date and time. Depending on how the date and time values are originally encoded in the dataset, you… 21 hours ago