building location based services at scale

Location-based services (LBS) have become an inherent part of our daily lives. From shopping orders, booking taxis, and finding nearby stores to building travel plans—everything is built upon location-based services.

Going down this rabbit hole, your design choices will depend on the type of location service you are building. Is it a static location (a shop or business that doesn't move) or a dynamic location (nearby friends, a food order being delivered, a moving taxi)?

Let’s start with mapping static locations. The first intuitive idea is to draw a circle of a certain radius (say, 5km) and search for any businesses within that circle. This translates to querying by latitude and longitude in a SQL database. However, this requires searching 2D data; you must query for longitudes and latitudes separately and then calculate intersections to find the locations of interest.

The Problem: As the dataset grows, this becomes both slow and compute-intensive. While applying database indexes can improve performance, we are still left with a massive number of data points to intersect, which doesn't solve the scalability issue.

The earlier approach shows us that 2D search is inefficient. The solution? Converting that 2D data into a 1D string. This is exactly what specialized databases do (e.g., Geo operations in Redis, PostGIS extension for Postgres).

What is a Geohash? It is an indexing method for geographical data. Generally, there are two types of spatial indexing:

Hash-Based: Even Grid, Geohash, Cartesian Tiers, etc.
Tree-Based: Quadtree, Google S2, R-Tree, etc.

Broadly speaking, these methods divide the map into smaller areas and build indexes for faster searching. Geohash, Quadtree, and Google S2 are the most popular.

Even Grid

This method works by dividing the map into a grid of multiple smaller, evenly divided areas (buckets). An area can contain multiple businesses, but a specific business belongs to only one grid cell.

Each object (user, restaurant, taxi) is assigned to a cell based on GPS coordinates. To find a taxi, we simply look for taxis in the cell you are currently in. This makes the search extremely fast.

The Drawback: While the grids are of even size, the data is not evenly distributed. A grid in Delhi might hold 10,000 items, while a grid in rural area might hold zero. This creates "hot spots" and empty memory. Additionally, if a user is on the edge of a grid, finding the nearest items requires checking neighboring cells, which adds complexity to the logic.

GeoHash

Geohash is generally superior to the simple Even Grid approach. It reduces 2D latitude/longitude data into a 1D string of letters and digits.

It works by recursively dividing areas into smaller and smaller grids with each additional character. Geohash is like a digital map that lets you zoom in. It supports 12 precision levels, though levels 4-6 are sufficient for most LBS applications.

Level (Char Length)	Cell Size (Approx)	Use Case
1	5,000km x 5,000km	Continent / Large Region
2	1,250km x 625km	Large State / Country
3	156km x 156km	Cluster of Cities
4	39km x 19km	A City
5	4.9km x 4.9km	A District / Neighborhood
6	1.2km x 0.6km	A few city blocks (Common for LBS)
7	150m x 150m	A large building footprint
8	38m x 19m	A house property
9	4.8m x 4.8m	A specific room / parking spot

We start by dividing the world into 4 quadrants, based on bits (0 or 1) -

Go Left: Longitude range [-180, 0] is represented by 0
Go Right: Longitude range [0, 180] is represented by 1
Go Down: Latitude range [-90, 0] is represented by 0
Go Up: Latitude range [0, 90] is represented by 1

We continue dividing these quadrants recursively until we reach the required granularity. These bits are then grouped (5 bits at a time) to form characters using Base32 encoding. The alphabet used is 0-9 and b-z (letters like a, i, l, and o are missing to avoid confusion).

Level 1: The world is divided into 32 rectangles (e.g., d, e, f).
Level 2: Each rectangle is further divided into 32 smaller ones (e.g., da, db).

This is a global standard. For example:

d: Covers the Eastern US and the Atlantic.
e: Covers Europe and West Africa.
f: Covers Eastern Canada and Greenland.

The Advantage: This allows for Prefix Searching. If User A is at tdr12 and User B is at tdr15, you know they are close because they share the prefix tdr1.

The Edge Case (Boundary Issues): Geohash works best most of the time, but like physical countries, it has boundary issues. Geohash guarantees that if two hashes share a long prefix, they are close. However, the opposite is not true: if there is no shared prefix, it does not mean the places are far apart.

For example, take two people standing in Greenwich Park (the Prime Meridian):

Person Left: Geohash gcpvj...
Person Right: Geohash u10hb...

A query SELECT * WHERE geohash LIKE 'gcpv%' will miss the person standing one meter away because their hash starts with u.

To solve this, we cannot rely on the prefix alone. The standard solution is to fetch businesses not just from the current cell, but also from the 8 neighboring cells. This ensures complete coverage and can be done in constant time.

Up Next …

Quad Tree
Google S2

Design choices for building Location Based Services / Part I

Even Grid

GeoHash

Up Next …

Comments

More from this blog

HTTP/3: Faster Connections and Better Mobility

Don’t Block, Just Queue: The Art of Asynchronous Traffic Control.

Design Choices for Location Based Services III

Design Choices for Location Based Services / Part II

Command Palette

Even Grid

GeoHash

Up Next …

Comments

More from this blog