De-Nesting Google Analytics Data in BigQuery

BigQuery is an analytics engine optimized to crunch pre-joined (or: nested) data. Sub-relations make sense in analytical scenarios because we don’t want to deal with joins over bigger datasets — just…

Adding Temporal Resiliency to Data Science Applications

Modern applications almost exclusively store their state in databases and also read any state they require to perform their tasks from databases. We’ll concern ourselves with adding resilience to the…

Simulated Data, Real Learnings : Power Analysis

Simulation is a powerful tool in the data science tool box. After reading this article, you’ll have a good understanding of how simulation can be used to estimate the power of a designed experiment…

A Practitioners Guide to Retrieval Augmented Generation (RAG)

The recent surge of interest in generative AI has led to a proliferation of AI assistants that can be used to solve a variety of tasks, including anything from shopping for products to searching for…

Creating Synthetic User Research: Using Persona Prompting and Autonomous Agents

Unlocking in-depth analysis with simulated customers and market research using generative AI & large language models. A step by step implementation.

Creating Satellite Image Timelapses

A while ago, I wrapped up the know-how of collecting and preparing satellite imagery data from the European Space Agency’s Sentinel satellites in my article titled Deep Dive into ESA’s Sentinel API…

How to build an OpenAI-compatible API

It is early 2024, and the Gen AI market is being dominated by OpenAI. For good reasons, too — they have the first mover’s advantage, being the first to provide an easy-to-use API for an LLM, and they…

How To Control Your Agent Action and Prompt System: LangGraph

if you are keeping up with Agent in Langchain, you know there are many ways to build an agent, but in this video, I bring a new idea to the table. So, what I’ll be showing you today is how to create…

System Design: Bloom Filter

Hash table is one of the most widely known and used data structures. With a wise choice of hash function, a hash table can produce optimal performance for insertion, search and deletion queries in…

5 Useful Visualizations to Enhance Your Analysis

I bet it is one of the most known and used libraries for data visualization because it is beginner friendly, enabling non-statisticians to build powerful graphics that help one extracting insights…