Robust One-Hot Encoding

It’s not fun, and especially when it comes to issues that could be avoided. One issue that frequently causes problems is one-hot encoding of data. Drawing from my own experience, I’ve learned that…

Temperature Scaling and Beam Search Text Generation in LLMs, for the ML-Adjacent

If you’ve spent any time with APIs for LLMs like those from OpenAI or Anthropic, you’ll have seen the temperature setting available in the API. How is this parameter used, and how does it work…

A Simple Way for Downloading Hundreds of Clipped Satellite Images Without Retrieving the Entire Scene (Python)

A Simple Way for Downloading Hundreds of Clipped Satellite Images Without Retrieving the Entire…. Learn how to download a clipped Sentinel-2 image for any Area of Interest (AOI), Lake Tahoe here, with just 12 lines of script..

Relation Extraction with Llama3 Models

Relation extraction (RE) is the task of extracting relationships from unstructured text to identify connections between various named entities. It is done in conjunction with named entity recognition…

Unleash Llama3 — How You Can Use the Latest Big-Tech Open-Source LLM

Llama3 is the latest model released by Meta’s AI team. According to Meta’s blog on Llama3, Llama3 outperforms GPT3.5 in 63.2% of cases on instruct human evaluation. According to this metric, Llama3's…

Using Double Machine Learning and Linear Programming to optimise treatment strategies

Welcome to my series on Causal AI, where we will explore the integration of causal reasoning into machine learning models. Expect to explore a number of practical applications across different…

Hyperparameters Tuning with MLflow and Hydra Sweeps

When we develop Machine Learning models, we usually need to run lots of experiments to figure out which hyperparameter setting is best for a given algorithm. This can often lead to dirty code and…

DuckDB and AWS — How to Aggregate 100 Million Rows in 1 Minute

When companies need a secure, performant, and scalable storage solution, they tend to gravitate toward the cloud. One of the most popular platforms in the game is AWS S3 — and for a good reason —…

Building an Email Assistant Application with Burr

In this tutorial, I will demonstrate how to use Burr, an open source framework (disclosure: I helped create it), using simple OpenAI client calls to GPT4, and FastAPI to create a custom email…

How to Build a RAG System with a Self-Querying Retriever in LangChain

Recently, I was browsing Max trying to find a movie to watch. Typically this involves browsing through the various lists presented to me, reading a few descriptions, and then picking something that…