Pavittar Singh

HTTP/3: Faster Connections and Better Mobility

Pavittar Singh — Tue, 06 Jan 2026 16:35:41 GMT

HTTP (Hypertext Transfer Protocol) is the backbone of the World Wide Web. It powers virtually every interaction we have online today. Since its public release in 1991, HTTP has evolved alongside the Web to meet the demands of modern software.

Under the hood, HTTP sits on top of three foundational protocols: TCP, IP, and UDP. While supporting actors like DNS and DHCP are essential, TCP, IP, and UDP are the pillars that hold up the Web.

IP (Internet Protocol) is a protocol provides rules governing the format of data sent over the internet . It enables devices to communicate by assigning unique identifiers known as IP addresses.
An IP address is like a physical address that tells where the data is coming from, or needs to reach. An IP address looks like 192.168.1.1 for IPv4, and 2001:0db8:0000:0000:0000:8a2e:0370:7334 for IPv6.

Transport Protocol: TCP vs. UDP

Once we have an address, we need a way to transport the data. This is where the two main "courier" protocols come in.

TCP (Transmission Control Protocol) TCP defines how to establish and maintain a connection through which applications exchange information. It works alongside IP, often referred to together as TCP/IP. TCP is the strict, bureaucratic manager. It is perfectly accurate but can be slow. It checks every packet to ensure nothing is missing. We needed TCP for the web because a missing tag could break an entire page. We couldn't afford to lose data.

UDP (User Datagram Protocol) UDP is the "sibling" protocol to TCP. It does the same job but in a fundamentally different way. UDP is fast and reckless. It fires data at the destination without checking if it arrived. It has zero latency but offers no guarantees. It doesn't "lose packets constantly" on purpose, but if a packet does get lost, UDP doesn't care and won't resend it.

The Evolution of HTTP

HTTP 1.1

We have used HTTP 1.1 since long time. It is text-based, human readable but has a major flaw that it processes requests sequentially.

If the browser needs a CSS file and a JS file, it sends a request for the CSS, waits for it to finish, and then asks for the JS. This creates a queue. If the first file gets stuck, everything behind it waits. This traffic jam is known as Head-of-Line Blocking.

HTTP 2

HTTP/2, released in 2015, introduced Multiplexing. This allowed the browser to send requests for Image A, Image B, and Script C simultaneously over a single TCP connection.

It also introduced other critical upgrades:

Header Compression: This reduces the amount of repetitive data sent over the wire.
Server Push: The server anticipates your needs. If you request index.html, it can push style.css and script.js over the same connection immediately—without waiting for you to ask.

However, HTTP/2 had a hidden flaw. While it seemed perfect, it could not solve Head-of-Line (HOL) Blocking at the TCP level. Because all files share a single TCP connection, you run the risk of everything getting blocked if a single packet is lost due to spotty Wi-Fi or mobile data.

🔒 HTTP/3

HTTP/3 is a revolution. It discards TCP entirely in favor of UDP (via QUIC – Quick UDP Internet Connections).

TCP is simply too rigid. The "Head-of-Line Blocking" problem inherent to TCP could not be solved without updating every operating system in the world. We needed a blank slate to build a better protocol, and UDP provided exactly that. QUIC essentially builds a new, smarter transport layer on top of UDP.

In HTTP/2, a single lost packet blocks the complete transmission. In HTTP/3, every file (or stream) has its own lane. If a packet for a single image is lost, only the stream for that specific image pauses. All other resources—CSS, scripts, text—keep downloading without interruption.

HTTP/3 also solves the mobility problem. Have you ever had a download fail when you walked out of your house and switched from Wi-Fi to 4G? That is a TCP problem (because your IP address changed). HTTP/3 solves this by using a Connection ID instead of an IP address. Even if you change networks, the server recognizes your ID, and the download resumes seamlessly.

Finally, in HTTP/3, security is baked in. It uses TLS 1.3 by default. You literally cannot have an unencrypted HTTP/3 connection. There is no http:// 🔓, only https:// 🔒.

The QUIC Lifecycle

In TCP, you shake hands to connect, then shake hands to encrypt. In QUIC, you do both at once.

A. Handshake (0-RTT Magic) During the first visit, the client sends both encryption keys and connection parameters immediately. The server acknowledges them and opens the connection. For every repeat visit, there is Zero Round Trip Time (0-RTT). Since the client has talked to the server before, it uses a saved "resumption ticket" to send encrypted data in the very first packet.

B. Data Transmission (Frames & Streams) Once connected, HTTP/3 doesn't send a single blob of data. It organizes data into a hierarchy:

Every file is assigned its own Stream ID. These streams are multiplexed but remain logically independent.
Data is chopped into Frames inside a stream.
Frames are wrapped inside QUIC Packets.
QUIC packets are wrapped inside UDP Datagrams.

The key difference: In HTTP/2, the browser knew the streams were separate, but TCP didn’t. In HTTP/3, QUIC knows they are different, so it handles packet loss per stream.

C. Header Compression (QPACK) HTTP/2 used HPACK, which relied on packets arriving in perfect order. Since QUIC allows packets to arrive out of order, HPACK breaks. Therefore, HTTP/3 introduces QPACK to handle compression without blocking.

The Architectural Shift

In HTTP/2, responsibilities were segmented into distinct protocols. HTTP/3 consolidates transport and security responsibilities into a new protocol that runs in User Space (the application).

Feature	HTTP/2 Architecture	HTTP/3 Architecture
Application Layer	Handles requests and multiplexing logic.	Handles only request methods (GET/POST).
Security Layer	TLS handles encryption via a separate handshake over TCP.	QUIC handles encryption (TLS 1.3) intrinsically during the handshake.
Transport Layer	TCP handles reliability, ordering, and congestion control in the OS Kernel.	QUIC handles streams, reliability, and congestion control in the Application (Browser).UDP is used strictly as an envelope to traverse firewalls. It provides no reliability guarantees.
Network Carrier	IP handles routing.	IP handles routing.

QUIC treats UDP like a blank piece of paper. It writes its own reliability rules, encryption rules, and connection IDs onto that paper. To the outside world (the router), it looks like a simple UDP packet. But inside, it is a sophisticated, reliable QUIC packet.

By moving reliability logic from the OS Kernel (TCP) to the Application Layer (QUIC), HTTP/3 finally eliminates Head-of-Line Blocking and modernizes the web

Don’t Block, Just Queue: The Art of Asynchronous Traffic Control.

Pavittar Singh — Tue, 30 Dec 2025 15:43:22 GMT

Modern systems handle traffic at a staggering scale. Google processes approximately ~190k searches per second, based on an estimated 16.4 billion searches per day. IoT devices generate data ranging from a few kilobytes to 1GB per second. A single Boeing 787 aircraft produces ~5GB of data per second while in flight.

But these numbers only tell half the story. This is just the data arriving at the "front door."

Once that data enters the system, the volume explodes. A single request often triggers multiple downstream actions—database writes, analytics logging, notifications, and third-party API calls. 5GB of external traffic can easily turn into 50GB of internal traffic.

How do we manage this load without the servers catching fire? Message Queues.

Message Queues do not magically reduce the size of the data. Instead, they manage how and when we process it. When that Boeing 787 dumps 5GB of data, a traditional system tries to process it all immediately. The database locks up, the CPU spikes to 100%, and the system crashes. This is a synchronous failure.

Message Queues decouple the components of the system, so they can be updated independently.
It smoothens the traffic flow, during traffic spikes the system does not crash it just takes longer to process the messages. It continues processing at its steady rate.
Instead of single service trying to do all the work, the work is distributed among different components. The Ingestion Service does not wait for Analytics Service to finish, it keeps processing its batch even if the Analytics Service fails.
Message Queues also increase the availability and performance of the system.

Event Streaming platforms are more advanced counterparts of Messaging Queues which are heavily used in modern Systems. Amazon SQS, ZeroMQ, RabbitMQ, Apache RocketMQ, Apache ActiveMQ, are some prominent Messaging Queues. Meanwhile, Apache Kafka, Apache Pulsar are some of the popular event streaming platforms.

The Anatomy: How it Works?

In order to understand this further, we need to define the 4 organs of the system —

The Producer: The service creating the data (e.g., The User Interface or an IoT sensor).
The Message: The data packet itself (containing the Payload + Headers).
The Broker (The Post Office): The physical server that receives, routes, and stores the message.
The Consumer: The service that takes the message off the queue and processes it.

Queues v/s Event Streams

While they look similar on the surface (data in, data out), technically they operate very differently. We can divide them into the Queue Model (Traditional) and the Stream Model (Modern).

The Queue Model (RabbitMQ, SQS)

Think of this like a task list: once a task is done, it is crossed off the list. The focus is on delivery. The goal is to get the message to a consumer and delete it as fast as possible.

Producer: Sends the message.
The Exchange (Router): Specific to advanced queues like RabbitMQ. The producer sends data here first. Based on configured rules, the exchange decides which specific queue receives the message.
The Queue (Buffer): A transient storage area that holds messages until they are consumed.
The Consumer: The worker that processes the message.
ACK (The Delete Command): Once the consumer finishes, it sends an acknowledgment (ACK). The queue then deletes the message immediately.

The Event Streaming Model (Kafka/Pulsar)

This model focuses on storage and history. The goal is to record what happened, in order, and let different workers consume the history at their own pace.

Producer: Emits "events" (facts that happened, e.g., "User Clicked").
The Topic: The category of events (e.g., 'Clicks', 'Invoices').
The Partition (Scalability Engine): A topic can be too massive for one server, so it is split into partitions. Messages are appended to the end of a partition log file. Note: Order is guaranteed only within a partition.
Offset (The Bookmark): Unlike queues, messages stay in the stream. The offset is just a number tracking where a consumer is. Different consumers can be at different offsets.
Consumer Group: If a topic has 4 partitions and you have a Consumer Group with 4 consumers, the system assigns one consumer to each partition for parallel processing.

Designing the Rails — Key Architecture Decisions

Once you are ready to scale, you cannot just "pick a queue." You need to match the tool to your traffic shape. Here are the seven critical decisions you must make.

Format & Size of Messages

Are you sending small JSON notifications or massive video logs? Most messaging tools (like Kafka and SQS) are optimized for small payloads (typically under 1MB). If you push large files directly into the queue, performance will degrade instantly. The Fix: Use the Claim Check Pattern. Upload the large file to object storage (like S3) first, then pass only the reference URL in the message payload.
Topology: Point-to-Point vs. Pub/Sub

This defines the relationship between the sender and the receiver.
- Point-to-Point (Work Queues): This is about competition. Even if 50 consumers are listening, a specific message is handled by only one of them. Use this for heavy background tasks (e.g., resizing an image) where you want to distribute the workload without duplicating effort.
- Publish/Subscribe (Pub/Sub): This is about broadcasting. The sender sends a message to a topic, and every active service subscribed to that topic gets its own copy. Use this for event notifications (e.g., a "User Signup" event triggers a Welcome Email, a Database Entry, and an Analytics Log simultaneously).

Delivery Semantics (Reliability)

How catastrophic is it if a message is lost? You typically choose between three tiers:
- At-Most-Once (Fire & Forget): High performance, but if the server crashes, the data is gone. Best for IoT sensor data where missing one temperature reading is acceptable.
- At-Least-Once (The Standard): The message is guaranteed to arrive, but it might arrive twice. Your consumer must be idempotent (able to handle duplicates without errors).
- Exactly-Once: The holy grail. It guarantees a single delivery but is expensive in terms of latency and complexity. Best reserved for financial transactions.
Order of Delivery

Does Message A have to be processed before Message B? Strict ordering is the enemy of scale because it forces you to use a single consumer. To get around this, we use Partitioned Ordering. You guarantee order only within a specific group (e.g., "All events for User ID 101 are ordered"), but different groups are processed in parallel.
Delivery Mechanism: Push vs. Pull

This defines who controls the flow of data.
- Push (e.g., RabbitMQ): The Broker forces messages onto the consumer. This offers the lowest latency but risks overwhelming the consumer during traffic spikes.
- Pull (e.g., Kafka, SQS): The Consumer requests messages when it is ready. This provides excellent flow control (the consumer is never overwhelmed) but introduces a slight latency delay due to polling.
Data Retention: Queue vs. Log

Do you want to delete the message or keep it?
- Traditional Queues (RabbitMQ/SQS): These are ephemeral. The goal is to delete the message as soon as it is processed.
- Event Streams (Kafka/Pulsar): These are durable. The message is stored like a log entry for a set time (e.g., 7 days) even after it is read. This allows you to "replay" history if a bug corrupts your database.
Throughput vs. Latency

You generally cannot maximize both.
- Low Latency: You process messages immediately as they arrive. This limits your total throughput.
- High Throughput: You move massive amounts of data (e.g., 10GB/sec) by batching messages together. This increases latency because the system must wait to fill the batch before sending it.

Message Queues are the unsung heroes of modern architecture. They allow our systems to breathe, to buffer, and to survive the inevitable chaos of the real world. Whether you are building a simple email notifier or a global streaming platform, the principle remains the same: Don't block. Just queue.

Design Choices for Location Based Services III

Pavittar Singh — Mon, 15 Dec 2025 19:30:41 GMT

Let’s jump back in where we left off with Quad Trees. You remember those, right? Or you are experiencing memory leaks just like me?

Need a refresher? didn’t you read the blog I sent you? I know you didn’t. Ah! no worries, we’ll figure it out.

Google S2 Geometry

S2 (aka S2 Geometry Library) is a spherical geometry library and spatial indexing system. It is an open-source C++ library developed and used extensively by Google.

It works directly on a sphere instead of projecting the world onto a flat 2D plane. This makes it robust for global applications, avoiding the distortions or seams—like those at the Poles or the International Date Line—that plague flat maps.

Internally, S2 is a set of math functions mapping 2D (latitude, longitude) points into S2 cell IDs. This simplifies 2D coordinates (double, double) into highly cacheable, 1D 64-bit integer lookups.

The S2 library solves a fundamental problem in global mapping: how to represent and query locations (like points, lines, and polygons) anywhere on Earth without the distortion, discontinuities, and singularities that occur when using traditional planar map projections (like the Mercator projection).

S2 Architecture

S2 performs a 3-step transformation:

Sphere to Cube: It projects the Earth onto the six square faces of a cube. This avoids the singularities at the poles and reduces the distortion found in traditional map projections.
Hierarchical Decomposition: Each face is divided into a quadtree. This subdivision can go as deep as Level 30, where a single leaf node represents roughly 0.74 cm² to 1 cm² of the Earth's surface.
Linearization: Finally, S2 uses a Hilbert Space-Filling Curve to traverse and number the cells. This converts 2D positions into a 1D index (the Cell ID).

Google S2 Visual Demo - https://s2viz.pavittarx.com/

What is Hilbert Curve?

The Hilbert Curve is the secret sauce of the S2 Library. It is used to map the sphere onto a 1D index.

The Hilbert Curve has a crucial property: two points that are close on the curve will also be close on the map. Hence, the Hilbert Curve preserves spatial locality.

Google S2 powers a variety of applications and systems. It is employed by Google Maps and MongoDB (for its geospatial indexing). It is famously used by Pokémon Go for the placement and distribution of in-game assets.

Uber H3

Uber’s H3 (Hexagonal Hierarchical Spatial Index) is designed for smoothing, movement, and analytics, whereas S2 is tailored more towards storage and containment. H3 utilizes a hexagonal system for geospatial indexing, a departure from the traditional square-based approach used in previous algorithms.

Uber H3 (Hexagonal Index) is built for analyzing movement—like tracking rides or visualizing traffic. S2, on the other hand, is better for just storing location data.

Quadtrees use squares, which have 8 neighbors (4 distinct edges and 4 corners). Consequently, moving diagonally covers more distance than moving horizontally, causing heatmaps to look blocky—much like Minecraft. Hexagons, conversely, create smooth and balanced grids. A hexagon has 6 neighbors, each equidistant from the center, meaning neighbor traversal uses the same math in every direction. This results in uniform movement and heatmaps that look smooth and natural.

H3 Architecture

Standard maps often use cylindrical projections (like Mercator) which distort size near the poles. H3 takes a completely different approach to minimize this distortion.

Sphere to Icosahedron (20 sided die)

It projects the Earth's spherical surface onto these 20 flat triangular faces. Because an icosahedron approximates a sphere much better than a cylinder or a cube does, the distortion of landmasses is significantly lower.

Hexagons replace triangles

H3 fills these faces of Icosahedron with hexagons. A single triangle on a grid actually only has 3 edge neighbors, but the vertices of the icosahedron meet in complex ways (5 triangles meet at a point) this results in (12) neighbors which makes it harder to calculate movement. Additionally, hexagons are equidistant. distance from center of one hexagon is equal to another.
Aperture-7 System
H3 has 16 levels of resolution, and each child splits into 7 child hexagons. H3 uses fuzzy containment, as 7 hexagons do not fit perfectly inside a larger hexagon. They spill over the edges slightly. You cannot say a child is 100% inside parent. It is excellent for gradients which naturally blur data, exactly what is needed for heatmaps or pricing. you want them to transition smoothly not cut sharply.

Pentagon is here.. Well, the mathematical one, not the military.

In order to make the math work, H3 hides 12 pentagons (5 sided polygon) at specific points on Earth (mostly at oceans) to close the sphere. Why you ask? You cannot tile a sphere perfectly with only hexagons (mathematically impossible; it's Euler's formula).

If a cell has high demand, Uber smoothens out the surge to neighboring cells, so you do not need sharp increase in price just walking 10 meters.

H3 Visualizer - https://h3viz.pavittarx.com/

Related Links ...

Design Choices for Location Based Services / Part II

Pavittar Singh — Tue, 09 Dec 2025 16:58:22 GMT

As we've seen, calculating locations in 2D space using latitude and longitude can be compute-intensive and time-consuming. However, queries and lookups in 1D are significantly faster. Therefore, we are exploring methods to map 2D coordinates (lat, long) to 1D space.

This can be achieved in two main ways: hash-based and tree-based methods. We've already examined hash-based approaches, namely Even Grid and GeoHash. In this blog post, we will be discussing Quad Tree.

Quad Tree

A Quadtree is a tree data structure used to efficiently organize and partition a two-dimensional space. It recursively divides a region into four quadrants (quads) until the content of a quadrant meets a certain condition (e.g., a maximum number of points). It is an effective method for spatial indexing, allowing for fast lookups and queries on data points (like latitude and longitude) that are clustered in a 2D plane.

A Quadtree is an in-memory data structure, not a database solution. It typically runs on a Location-Based Service (LBS) Server and is built during the server's startup time. The following steps detail the construction process:

The process begins with a single node (the root) that represents the entire 2D space being indexed (e.g., a city map).
A node is split into four child regions if it contains more data points or objects than a predefined maximum capacity (threshold).
The four child regions are typically designated as North-West, North-East, South-East, and South-West.
The points within the parent node are then redistributed into the new child quadrants based on their coordinates.
This same splitting rule is applied recursively to each of the four child nodes.
The process stops when a node (a leaf) satisfies the stopping condition—usually containing less than the maximum capacity of points or reaching a maximum defined depth.

The Quadtree follows adaptive precision, meaning the structure automatically adjusts to the data density. It saves computational resources by not over-dividing empty regions. It is due to this property that it is used for collision detection in games and Level of Detail (LOD) rendering in graphics.

While Quadtrees are excellent for spatial analysis, it is useful to compare them to alternative methods. For instance, Geohash is typically preferred for high-volume writes (like tracking moving objects) and simplicity, while Quadtrees excel for complex spatial analysis and non-uniform data distribution.

Additionally, as the server builds the Quadtree at startup, indexing millions of records can result in the Location-Based Service (LBS) taking a few minutes to become operational. This creates a period of service unavailability during the startup process. To mitigate this downtime, a deployment strategy of incremental rollout (or canary deployment) can be used, updating only a small subset of servers at a time while maintaining the availability of the others.

This deployment method, however, may result in some servers serving stale data for a short period. An alternative is to update the Quadtree as changes occur (on-the-fly). While this ensures data freshness, it significantly complicates the design because it necessitates implementing locking mechanisms within the Quadtree structure to ensure thread safety in a multi-threaded environment.

Additional Reads & Watch

Quadtree - https://www.youtube.com/watch?v=jxbDYxm-pXg
Booking.com uses Quadtree - https://medium.com/booking-com-development/how-to-search-point-of-interest-poi-markers-on-a-map-efficiently-b5f3f37a914a

Design choices for building Location Based Services / Part I

Pavittar Singh — Tue, 25 Nov 2025 18:10:07 GMT

Location-based services (LBS) have become an inherent part of our daily lives. From shopping orders, booking taxis, and finding nearby stores to building travel plans—everything is built upon location-based services.

Going down this rabbit hole, your design choices will depend on the type of location service you are building. Is it a static location (a shop or business that doesn't move) or a dynamic location (nearby friends, a food order being delivered, a moving taxi)?

Let’s start with mapping static locations. The first intuitive idea is to draw a circle of a certain radius (say, 5km) and search for any businesses within that circle. This translates to querying by latitude and longitude in a SQL database. However, this requires searching 2D data; you must query for longitudes and latitudes separately and then calculate intersections to find the locations of interest.

The Problem: As the dataset grows, this becomes both slow and compute-intensive. While applying database indexes can improve performance, we are still left with a massive number of data points to intersect, which doesn't solve the scalability issue.

The earlier approach shows us that 2D search is inefficient. The solution? Converting that 2D data into a 1D string. This is exactly what specialized databases do (e.g., Geo operations in Redis, PostGIS extension for Postgres).

What is a Geohash? It is an indexing method for geographical data. Generally, there are two types of spatial indexing:

Hash-Based: Even Grid, Geohash, Cartesian Tiers, etc.
Tree-Based: Quadtree, Google S2, R-Tree, etc.

Broadly speaking, these methods divide the map into smaller areas and build indexes for faster searching. Geohash, Quadtree, and Google S2 are the most popular.

Even Grid

This method works by dividing the map into a grid of multiple smaller, evenly divided areas (buckets). An area can contain multiple businesses, but a specific business belongs to only one grid cell.

Each object (user, restaurant, taxi) is assigned to a cell based on GPS coordinates. To find a taxi, we simply look for taxis in the cell you are currently in. This makes the search extremely fast.

The Drawback: While the grids are of even size, the data is not evenly distributed. A grid in Delhi might hold 10,000 items, while a grid in rural area might hold zero. This creates "hot spots" and empty memory. Additionally, if a user is on the edge of a grid, finding the nearest items requires checking neighboring cells, which adds complexity to the logic.

GeoHash

Geohash is generally superior to the simple Even Grid approach. It reduces 2D latitude/longitude data into a 1D string of letters and digits.

It works by recursively dividing areas into smaller and smaller grids with each additional character. Geohash is like a digital map that lets you zoom in. It supports 12 precision levels, though levels 4-6 are sufficient for most LBS applications.

Level (Char Length)	Cell Size (Approx)	Use Case
1	5,000km x 5,000km	Continent / Large Region
2	1,250km x 625km	Large State / Country
3	156km x 156km	Cluster of Cities
4	39km x 19km	A City
5	4.9km x 4.9km	A District / Neighborhood
6	1.2km x 0.6km	A few city blocks (Common for LBS)
7	150m x 150m	A large building footprint
8	38m x 19m	A house property
9	4.8m x 4.8m	A specific room / parking spot

We start by dividing the world into 4 quadrants, based on bits (0 or 1) -

Go Left: Longitude range [-180, 0] is represented by 0
Go Right: Longitude range [0, 180] is represented by 1
Go Down: Latitude range [-90, 0] is represented by 0
Go Up: Latitude range [0, 90] is represented by 1

We continue dividing these quadrants recursively until we reach the required granularity. These bits are then grouped (5 bits at a time) to form characters using Base32 encoding. The alphabet used is 0-9 and b-z (letters like a, i, l, and o are missing to avoid confusion).

Level 1: The world is divided into 32 rectangles (e.g., d, e, f).
Level 2: Each rectangle is further divided into 32 smaller ones (e.g., da, db).

This is a global standard. For example:

d: Covers the Eastern US and the Atlantic.
e: Covers Europe and West Africa.
f: Covers Eastern Canada and Greenland.

The Advantage: This allows for Prefix Searching. If User A is at tdr12 and User B is at tdr15, you know they are close because they share the prefix tdr1.

The Edge Case (Boundary Issues): Geohash works best most of the time, but like physical countries, it has boundary issues. Geohash guarantees that if two hashes share a long prefix, they are close. However, the opposite is not true: if there is no shared prefix, it does not mean the places are far apart.

For example, take two people standing in Greenwich Park (the Prime Meridian):

Person Left: Geohash gcpvj...
Person Right: Geohash u10hb...

A query SELECT * WHERE geohash LIKE 'gcpv%' will miss the person standing one meter away because their hash starts with u.

To solve this, we cannot rely on the prefix alone. The standard solution is to fetch businesses not just from the current cell, but also from the 8 neighboring cells. This ensures complete coverage and can be done in constant time.

Up Next …

Quad Tree
Google S2

Unique IDs... Which is the best?

Pavittar Singh — Tue, 11 Nov 2025 05:44:23 GMT

When thinking of generating unique IDs, the very first approach that comes to mind is using auto-incrementing integer IDs in traditional databases. This works well for small systems or systems with a single database. However, you cannot use this method when building large-scale distributed applications.

The reason is that there is no way for different systems to coordinate what the next sequence number should be. Consequently, all nodes will generate their own sequences (e.g., 1, 2, 3), resulting in the same ID pointing to different records across the system.

Multi-Master Replication

We can implement a variation of the auto-increment feature where, instead of increasing the next ID by 1, we increase it by k, where k is the total number of nodes in the distributed system.

The next ID generated on a single system has a difference of k from its last ID, so the IDs generated on two different nodes will never collide. This solves the scalability issue of the basic auto-increment approach. However, this method still has several issues:

It does not scale well when a server is added or removed (as k would need to change).
IDs do not increase chronologically across the entire system.
It is hard to scale across multiple data centers.

Ticket Server

A Ticket Server is another way of generating unique IDs. This approach, used by services like Flickr and Etsy, uses a centralized, dedicated server to generate globally unique primary keys.

It is relatively easy to implement and scales from small- to medium-scale systems. However, since it is centralized, it creates a single point of failure (SPOF). Any issue with the ticket server can halt or bring down the whole system. While you could introduce distributed ticket servers, that brings its own challenges, such as data synchronization.

UUID (Universally Unique Identifier) v4

A UUID is a 128-bit number used to identify information in computer systems. UUIDs have a very low probability of collision. An example is 1e653ffd-5b98-4648-8571-5848edb0b7fe.

It consists of a 128-bit (16-byte) block of data, where 6 bits are reserved for version and variant flags, and the remaining 122 bits are for random data.

Since UUID v4 is random, it can be generated without coordination between server nodes, thus avoiding synchronization issues. This makes it highly scalable, as each node can generate IDs independently. However, this approach still has a few issues:

They are non-numeric, which can be inefficient for storage compared to a 64-bit integer.
UUID v4 is purely random, meaning the IDs do not increase chronologically and are therefore not sortable.
Their 128-bit size is larger than a 64-bit integer, which can impact storage and index size.

UUID v4 provides good random data distribution. (Other random ID formats include Nano ID and CUID2.) By "random data distribution," we mean that if many UUID v4 IDs are generated and inserted, they will be scattered randomly throughout the database's index. This is good for distributed (hash-based) databases but can lead to index fragmentation and increased disk I/O in traditional single-database (B-Tree) systems.

❄️ Twitter Snowflake ID

Twitter Snowflake is a time-based unique ID generation algorithm developed by X (formerly Twitter). It was designed to address the problem of generating unique IDs for distributed systems at scale.

Snowflake was designed to overcome the shortcomings of other methods. For example, Multi-Master Replication is hard to scale, and Ticket Servers create a single point of failure. While UUID v4 avoids these issues, its random nature means it isn't time-sortable.

The Snowflake ID is a 64-bit ID designed for high-throughput, distributed systems. It consists of:

Sign Bit (1 bit): It will always be 0, reserved for future use.
Timestamp (41 bits): Milliseconds since a custom epoch. Twitter's default epoch is 1288834974657 (Nov 04, 2010). This is the most important part, as the timestamp ensures that IDs are sortable by time. 41 bits provide for ~69 years of IDs.
Datacenter ID (5 bits): Allows for 2^5 = 32 datacenters.
Machine ID (5 bits): Allows for 2^5 = 32 machines per datacenter.
Sequence number (12 bits): A counter for IDs generated within the same millisecond on the same machine. It is reset to 0 every millisecond and allows for 2^12 = 4096 IDs per millisecond, per machine.

Datacenter and machine IDs are typically chosen at startup and fixed. Any accidental change in these values can lead to ID conflicts.

ULID (Universally Unique Lexographically Sortable Identifier)

ULID is also a 128-bit (16-byte) ID that is similar to Twitter Snowflake, and UUID v7.

It uses a 48-bit timestamp (Unix epoch in milliseconds).
It is lexicographically sortable and has time-based locality.
It is not natively compatible with UUID, so migration can be difficult.
It can be generated offline without coordination.

If you want a time-sortable ID that is a native, specification-compliant, you should use UUID v7.

UUID (Universally Unique Identifier) v7

UUID v7 uses a 48-bit timestamp, 74 bits for randomness, and 6 bits for version and variant. The timestamp is based on the Unix epoch in milliseconds (not seconds). It is designed for high-load databases and distributed systems.

UUID v7 is best for systems that require records to be stored in the order they were created, such as logs, database indexes, and audit trails. UUIDs are supported as a native type by many databases; alternatively, a Binary(16) data type can be used to store them.

Which ID is the best?

Ultimately, we are left with two kinds of IDs: random and time-based sortable. The best choice depends on the kind of database you are using.

B-Tree based Databases (MySQL, PostgreSQL, etc.): Time-based IDs are best. These databases sort keys in an index. Random IDs are bad here, as they cause index fragmentation, making INSERT operations very slow. Sequential IDs are fast because they are simply appended to the end of the index.
Hash-Based Distributed Databases (DynamoDB, Cassandra, etc.): Random IDs (UUID v4) are best. Time-based IDs are bad here because they cause write hotspots, making one server do all the work. These databases "distribute" data by hashing the ID to decide which server (node) to store it on. If you use a time-based ID, all new IDs will have a similar prefix (the timestamp) and will be hashed to the same server, creating a bottleneck. Random IDs distribute the load perfectly.

Ultimately, the quest for the "perfect" unique ID reveals a core principle of system design: there is no single best solution, only the right one for the job. As we've seen, the choice between a random or a time-sortable ID is a critical decision that depends entirely on your database architecture. Choosing a random ID for a B-Tree database can lead to crippling fragmentation, while using a sequential ID on a hash-based system can create debilitating hotspots. The best ID, therefore, is the one that aligns with how your data is stored and distributed.

How Consistent Hashing Scales Distributed Systems

Pavittar Singh — Tue, 28 Oct 2025 17:53:54 GMT

Scaling is the process of increasing the system’s capacity to handle growing amount work. It's about designing your application or infrastructure so it can manage more users, data, or requests without failing or becoming slow.

There were two major ways of scaling systems —

Vertical Scaling (Scaling Up)

You might add more power to your system (more cores, more RAM, more storage, faster SSDs). In this case, the ability of the system to handle load is increased. This is also the easiest since it does not require changes to application or its architecture. However, on the flip side, it expensive — as high-end hardware costs more, it is limited — you will soon hit machine’s max capacity. Additionally, upgrades require downtime, and single failure can bring down entire application.
Horizontal Scaling (Scaling Out)
This involves adding more servers to distribute the system load. This could be distributing your database, application or (or its parts) onto separate/multiple machines.
It is more fault tolerant solution, brings the near unlimited scalability, cheap — you could use multiple standard systems rather than one high-end one, zero downtime during upgrades. However, it is much more complex, as it requires changes to architecture as well as application level code. You need shared state for application to be in sync, load balancers to manage the traffic, etc.

There is also database level scaling (caching, read replicas, and sharding), and application level scaling (asynchronous processing, microservices), which is also used alongside above two techniques.

One of the major problems in horizontal scaling is how to distribute the traffic/load on multiple servers efficiently.

The Rehashing Problem

The simplest way to distribute the load among N servers is using the formula, serverIndex = hash(key) % N .

For example, with 4 servers, we use the modular operation hash(key) % 4 to find the server where a key is stored. This works reasonably well for a fixed number of servers if the data distribution is even, but that is seldom the case in real-world distributed systems.

Now, if a server crashes (say, we go from 4 servers to 3), the formula becomes hash(key) % 3. This change causes a catastrophic remapping. Here is what happens,

The result of hash(key) % 3 will be different from hash(key) % 4 for majority of keys — all keys will needed to be remapped, not just the keys on Server 3.
Your cache is suddenly empty. Since, keys map to a different server and application cannot find the expected data.
All cache misses directly hit your database, overloading it, and likely bringing it to a halt. This is often called, thundering herd problem.
You will need to remap almost all K keys in the system, which is an $O(K)$ operation that can grind the system to a halt.

Consistent hashing solves this.

Consistent Hashing

The primary benefit of consistent hashing is that when a server (or slot) is added or removed, only k/n keys need to be remapped on average, where k is the total number of keys and n is the number of servers.

It works by placing servers on a conceptual ring. It then maps a key to the first server it "sees" by moving clockwise around the ring from the key's position. When a server is added or removed, only the keys in its specific arc are affected. The vast majority of keys stay exactly where they are.

Hash Ring

The hash ring represents output space of the hash function in use. For example, If we use SHA-1 as hash function, it’s hash space goes from 0 to 2^160 -1 , lets say x0 … xN.

The hash ring represents the entire output space of the hash function being used. For example, if we use SHA-1 as the hash function, its hash space goes from 0 to 2^160 - 1. In this case x0 will be 0, and xN will be 2^160 - 1.

Hash Servers - We map servers onto the ring using IP or server name, using the same hash function. The hash function used in here is different as there is no modulo operation.

In consistent hashing, we place both the keys and the servers onto this same ring. To determine which server a key is stored on, we travel clockwise from the key's location until a server is found.

For example, in above if we need to find the server for k0, we will go clockwise from k0, until we reach s1. Hence, key k0 is stored on server s1 . Similarly, key k4 and k5 are stored on server s3.

Now, say if we

add a server s5 between, k0 and k1. Key k0 will now be found on server s5, while k1 and k2 will be found on server s1. So, only one key was needed to be remapped.
If we remove server s3 then only keys k4 and k5 will be needed to be remapped on server s4.

So far it would have become obvious that this issues creates two further problems —

It is possible for servers to be clustered unevenly on the ring, creating large gaps. See how, s4, s5, and s1 , s2 only cover half of the ring.
It is possible keys are unevenly distributed, where almost all the keys (hotspots) are only assigned to a few servers.

Virtual Nodes

Virtual nodes are used to solve the problem of uneven key distribution in consistent hashing. Instead of mapping a physical server to one point, the server is represented by many virtual nodes at multiple places on the ring. In a real-world system, this number is typically large, which reduces the probability of an uneven distribution of keys. This way, a single server is responsible for many small partitions of the ring.

This large number of virtual nodes helps balance the distribution of keys among the servers on the ring. The standard deviation of the key distribution will be smaller when we increase the number of virtual nodes. However, more space is needed to store the metadata mapping these virtual nodes to their physical servers. This is a tradeoff, and we can tune the number of virtual nodes to fit our system's requirements.

Applications

Discord Chat Application
Akamai Content Delivery Network
Partitioning Component of Amazon’s Dynamo Database.

Deciphering CAP: The Fundamental Trade-off in Distributed Systems

Pavittar Singh — Tue, 21 Oct 2025 18:24:54 GMT

CAP Theorem is measure that can help you assess the tradeoffs when designing distributed systems. The trade-off is between Consistency, Availability and Partition Tolerance.

All of these three are inter-related, with subtle difference, and all of them these cannot be guaranteed at the same time. Before we dive any further into CAP theorem, lets deep dive these three —

Consistency — is about whether the data client sees is up to data or not. All nodes in a system with strong consistency shows most recent data or throws errors. There is no in-between.
You must note that, Consistency here is different from consistency guarantee in databases with ACID transactions.
Availability — focuses more on making data available rather than data being most up to data. You might be seeing some stale data while the data is being updated. In highly available system, You are supposed to get a response, unless the node is failing.
Partition Tolerance - is ability of a distributed system to continue to operate despite an arbitrary number of messages being lost, delayed, or dropped by the network between nodes.
A partition (network partition) is a communication failure that divides the nodes of a distributed system into two or more separate groups (partition), where nodes within groups can communicate but nodes between groups cannot.

CAP theorem states it is impossible for a distributed system to simultaneously provide more than two of these three guarantees: consistency, availability, and partition tolerance.

Consistency: Every read receives the most recent write or an error.
Availability: Every request receives a (non-error) response, without guarantee that it contains the most recent write.
Partition Tolerance: The system continues to operate despite network failures (partitions) that isolate some nodes.

There leads to three theoretical system configurations —

CP (Consistency & Partition Tolerance) System - sacrifices availability to support consistency & partition tolerance.
AP (Availability & Partition Tolerance) System - sacrifices consistency to support availability and partition tolerance.
CA (Consistency & Availability) System - sacrifices partition tolerance, to support consistency & availability.

Now ideally, if network partition never occurs, and data from node n0 is automatically replicated to all other nodes, we could thus create a CA System, as both consistency and availability are achieved. However, in real-world distributed systems, network partition is unavoidable, and thus we are forced choose between consistency and availability.

If we choose Consistency, we get a CP System, on the other hand if we choose Availability we get AP system. Consistency and Availability are also interrelated, as if we reduce consistency of a system, it becomes more available, as stale data can be used, and if we choose more of availability the data client sees is not up to date.

Now when we are left with Consistency and Availablity, it is hard to say one system is better than the other as it completely depends on type of application you are building. A banking/financial app would require strong consistency since it's better to make the customer wait, than to show incorrect balance. Similarly, for a social media feed, it makes much more sense to show something even if it not the latest, rather than throwing errors at all the places.

Does your code have debt?

Pavittar Singh — Sat, 13 Nov 2021 19:35:12 GMT

Good code is an asset, while Bad code is a liability. The liability also has a technical name as Tech/Code Debt.

Code debt arises when the delivered code is not very optimal requires refactoring. This debt can be due to faster delivery timelines, missing documentation, legacy/redundant code delivery without appropriating testing.

Code debt is similar to taking a loan. Businesses borrow money to fulfill ambitions that are not possible with the resources they currently have. Similarly, when the short-term gain is prioritized over long-term stability, you accumulate code debt.

Just as with monetary debt, you must be wise with code debt. Rushing out a feature to gain market edge and build upon the experience you may gain is purely rational. However, not putting in enough time and effort for refactoring will lead to unmaintained code, unknown issues, and stretched timelines. Finally, the code will be a mess.

Finally, Code debt should not be mistaken to be a mess. A mess is a mess. Code debt is a result of rational decisions. A code mess results from non-maintenance, following bad practices, and not putting appropriate resources for the needful.

#code #refactoring #debt #code-debt #cleancode #development

Using multiple Github accounts with SSH

Pavittar Singh — Wed, 06 Oct 2021 13:30:58 GMT

When I started using Github, I used Github Desktop for sometime because that felt easy. Down the line, I learnt using git via CLI. And, I haven't touched the electron tool back ever. But, I was still using Github via HTTP vs SSH.

I fell in love with ssh when I first discovered it. Once setup you can use ssh to work with multiple repositories under same account. The only downside is the initial setup that might feel complicated if you are using it for the very first time. However, it saves you from the pain of password/token based authentication (...painful).

If you haven't used ssh before, these two articles would help you set up pretty quick.

So far so good, this works good for a single account. What if you have to use multiple accounts? Personal and Work?

Use ssh-keygen to start creating your new ssh key. Note, this email should be same as the one you use with your Github account.
```
     ssh-keygen -t rsa -C "youremail@domain.com"
```
Give this ssh-key a different name, so that it does not conflict/overwrite existing keys. All existing ssh keys can be found under ~/.ssh folder. I named my own as "pavittarxnoauth".
```
   Enter file in which to save the key (/home/pavittarx/.ssh/id_rsa): pavittarxnoauth
```
In the next steps, provide a password or simply hit enter if you do not want to add any passwords while authentication.
Check if the key has been successfully created. If you cannot find the key in .ssh folder. Repeat steps 1-3.
```
   ls ~/.ssh
```
Start your ssh-agent if you have not already.
```
   eval `ssh-agent -s`
```
Add the key to your ssh agent.
```
   ssh-add ~/.ssh/pavittarxnoauth
```
Navigate to ~/.ssh directory.
```
   cd ~/.ssh
```
Add following entry to the config file. You can also create one if it doesn't exist. Replace, pavittarxnoauth with your key-file name. Also, replace github-noauth to anything as per your preference.
```
Host github-noauth
 HostName github.com
 User git
 IdentityFile ~/.ssh/pavittarxnoauth
```
Add the ~/.ssh/ to your Github account as stated here.

Clone your repo with ssh, or change the git remote and git config for the repository.

    git clone git@github-noauth:/

For existing repository, that has been cloned already.

Change git remote origin

  git remote remove origin
  git remote add origin git@github-noauth:/

Change git config

  git config --local user.email "your-github-email@domain.com"
  git config --local user.name "github-username"

You can use same process to add multiple keys, with same or different accounts to Github. Also, the same is reproducable for other git providers such as Bitbucket/ Gitlab.

How code execution works in Javascript?

Pavittar Singh — Sat, 14 Aug 2021 14:29:36 GMT

Javascript is asynchronous, that simply means that it does not wait. Well, it does but not very often. And, unless you ask it to await.

Javascript is like the language of Web. So, if you planning to learn Web Development. Javascript is one of the things that you must consider learning. Although, you will be able to get multiple tutorials online that would teach you the semantics, syntax and programming in Javascript. But, most of them would skip over the fact that "Javascipt is single-threaded and asynchronous".

Single threaded model

Javascript being single-threaded,which means it has one memory heap and one call stack. So, that comes down to that you can only run a single task at a given time, and unti l then your program is stuck.

Consider the following code snippet:

  1. console.log("One");
  2. console.log("Two");
  3. console.log("Three");

/* 
  Result: 

    One
    Two 
    Three
*/

Here, Line 1 should finish executing before Line 2, and Line 2 should finish excuting before Line 3.

However, you must know that even though Javascript runs on a single thread. There are certain cases where Javascript behaves asynchronously. For example, see the below code snippet:


  1. console.log("One");
  2. setTimeout(() => console.log("Two"), 2000);
  3. console.log("Three");

  /* 
   Result: 

    One
    Three
    Two
  */

Here, this bit of code skips over Line 2, and executes Line 3 before Line 2 finishes. Now, that is asynchronous behaviour for you.

However for this to make sense to you. You must understand Javascript is an interpreted language and not a compiled language like Java, C or C++. There is no compilation step for Javascript, instead the code is interpreted line by line run by the interpreter. Modern intepretors in some enviornments use JIT (Just-in-Time) compilation, which convert Javscript code to executable code just when it is about to run.

V8 Engine written in C++ is used in Chrome and NodeJs enviornments to run Javascript. Firefox uses SpiderMonkey , which was introduced in 2017. IE and Edge had its own Chakra Engine. The newer Edge and all other chromium based browsers use V8 Engine under the hood.

These engine offers extra API features, which run the tasks in background while the Javascript keeps executing. In browser these external features are often referred to as Web APIs. In the example above, setTimeout() is an external API. So, whenever the browser encounters code using setTimeout it starts executing this snippet in background. And, when the task finishes it transfers the control back to Javascript.

For a deeper understanding of how this happens.

https://www.youtube.com/watch?v=8aGhZQkoFbQ

Event Loop in action: http://latentflip.com/loupe

List of available Web APIs: https://developer.mozilla.org/en-US/docs/Web/API

Why you should avoid using CMS for your projects?

Pavittar Singh — Sat, 14 Aug 2021 10:54:05 GMT

I will start with Strapi, and If you do not know it already. Strapi is a headless CMS, written on top of Nodejs. So, you can use it to create APIs, while the content can be managed by the CMS. It uses Koa (not express) under the hood for configuring your server.

It comes preconfigured with middlewares, authentication and role-based access for you. So, you don't have to go around and configure a good part of it. So, if your use-case is easy enough. You can be running in minutes in Strapi. Now that it has all the features for you that would allow you to build complex apps. However, in cases where your use case becomes complex you do be feel better off to have done it with express instead. Strapi is more UI-friendly rather than code-friendly. Afterall, its a CMS.

Strapi uses a Content-Manager plugin that allows you to create collections as per your requirements. The default quickstart template comes preconfigured with sqlite3 but you can always configure your own database as well. I have been using MongoDb, and the sad bit it, all the collection information, as well as roles and permission information is kept in database rather than the app. And there is quite a time you might spend juggling around the backend (node-project) and the Strapi's own admin UI.

The interesting bit however is that, Strapi creates basic CRUD operations for you. The APIs support sorting and filtering by default, but the syntax can be a bit confusing at times. The documentation on the website is not very descriptive either. Yet, it can be all that you may need. But, then again if you want something different than basic CRUD, you have to end up writing your own custome APIs.

So, good so far, but here are the extremely bad bits -

Strapi has its own system of recognizing enviornment variables. But, that doesn't work as you expect.
The project overall is heavy, since a lot of stuff is already configured for you.
The scope of configuring error response messages is very low. You can do that for some use cases or at places when you are writing your own APIs.
From security perspective, the APIs are not very secure. Even though you have role based access, you can make PUT requests to all the collections, if you belong to that particular role.
For MongoDb, strapi has a wrapper around mongoose, which in itself is a wrapper around MongoDb's native Node APIs.
Even for custom APIs, strapi does not support transactions. I was able to write transaction but that did always end in failure no matter what.
The reload time on the development enviroment is very high. It is a known problem and the solutions provided in forums do not work.
You will not be able to build projects on AWS's t2.nano & t2.micro instances. t2.small is needed at the very least or else the project would error out at the build step.

Now, that was Strapi so far. But you can see similar issues in other CMS's as well.

If you are concerned about security of your project. Then, using a CMS is certainly should be "NO". Why? just because CMS's vuerabilities are your project's vunerabilities.
The glitter and shine in the front are problems in the dark. No project advertises its vunerabilities, same goes for every CMS. You only come to know when you work with them. The simplicity in the start end up making simple tasks complex as your project evolves.
Your control over the project is not so good. Since, for everything you do must be done in a way CMS allows you to. Also, you options are limited in terms of approaches you may go to implement a particular feature in your project.
CMS add unnecessary bloat to your projects, which can also result in performance issues.
CMS do follow monolith architecture, so scaling can be an issue unless you CMS support it.

I hope you got the gist of it until now... rest for later.

Understanding the Client-Server model of application development

Pavittar Singh — Sun, 25 Apr 2021 15:14:23 GMT

For someone starting out their web development journey, he must first understand how web apps work.

The Client-Server model act as the core of every web app. Be it a social networking app as Facebook, Instagram or Whatsapp; an online MMORPG game you love to play; apps such as Zoom, Google Meet or anything else that has got something to do with the internet. But, what is this Client-Server model?

The client-server model tells us how the server and the client interact with each other. In order to understand this, you must understand what a client is?

A client is a device that makes requests for data or information. This data can be a file, image, webpage or anything else. Say for example, when you make a Google Search from your phone. Your phone is the client here, It made the request for information you are searching for.

A server resides on the other end of this equation. The server is the one who provides this information. The reason for this is simple enough. It is if some information has to be provided to you, it must exist somewhere and you must have access to it. Now taking the earlier example, when you make a Google Search the information you are asking for is at Google's server. The server sends it to you through the internet.

Hence, a client is one that makes requests. And, the server on the other end is one that responds to those requests. A client can be a smartphone, a laptop, a desktop, an IOT device, etc. Now, have a look at the picture below for a visual representation of the model.

Thank you for the read. More heartly stuff coming soon. Do subscribe for quick updates. Also, do let me know what other stuff should I cover for you? I'll be happy to consider.

Follow Me:

Github - https://github.com/pavittarx Find me on Linkedin - https://linkedin.com/in/pavittarx