Every time you sign up for a Google account and type a username, the UI tells you instantly whether it’s taken. Google has over 2 billion accounts. How do you search 2 billion records in microseconds, from anywhere in the world? The answer is a Bloom filter — and I built an interactive demo that shows exactly how it works.
The Pipeline
When you type a username in the demo, it executes the same four-step pipeline that Google likely uses:
- Debounce — waits 300ms for you to stop typing
- Bloom filter — instant in-memory check against a 100-million-bit filter (~12MB). If the bit pattern doesn’t match, the username is definitely available — no database query needed
- Prefix search — DuckDB-WASM scans a Parquet file for similar usernames and suggests alternatives
- Database confirmation — full query against DuckDB to eliminate false positives
The key insight is step 2. The Bloom filter eliminates most lookups before they ever hit the database.
What’s a Bloom Filter?
A Bloom filter is a probabilistic data structure — a bit array where you trade a small chance of false positives for massive space savings and instant lookups.
To add an item: run it through k hash functions, set those k bits to 1. To check an item: run the same hash functions — if all k bits are 1, the item probably exists. If any bit is 0, it definitely doesn’t.
False positives are possible (other items may have set the same bits). False negatives are impossible. That’s the property that makes this useful: when the filter says “available,” you can trust it completely.
The Numbers
The demo runs against 10 million usernames with a Bloom filter tuned to ~0.8% false positive rate using 7 hash functions. The entire filter fits in 12MB of memory. The Parquet file with all 10 million usernames is ~26.5MB thanks to ZSTD compression.
At Google’s scale — 2 billion accounts — the same approach needs about 2.4GB of RAM. That fits on a single server. Distributed across ~200 points of presence worldwide, every username check resolves in microseconds without touching a database.
Why DuckDB-WASM
The demo runs entirely in the browser. DuckDB-WASM handles the Parquet file — scanning, prefix matching, and exact lookups — all client-side. No server, no API calls, no latency. It’s the same approach I used in Duck-UI: push the compute to the browser and let the data stay local.
DuckDB’s Parquet support is what makes this practical. The 10 million usernames compress down to a single 26.5MB file that DuckDB reads directly without loading it all into memory. Try doing that with JSON.
The Stack
Deliberately minimal: plain HTML and JavaScript for the frontend, DuckDB-WASM for the database layer, Python with Faker and DuckDB for data generation. No framework, no build step. Serve it with python -m http.server and it works.
The data generation script creates 10 million realistic usernames and builds both the Parquet file and the Bloom filter. If you want to regenerate everything from scratch: uv run generate_data.py.
Why Build This
I built this as a teaching tool. Bloom filters are one of those data structures that sound theoretical until you see them in action. The gap between “probabilistic bit array” and “Google checks 2 billion usernames in microseconds” is huge — and the demo bridges it. You type a username, you see the pipeline execute step by step, and suddenly the concept clicks.
Try it at bloom.caioricciuti.com or check the source on GitHub.