明暗

Show HN: Turbolite – a SQLite VFS serving sub-250ms cold JOIN queries from S3

· English· HackerNews

I built a SQLite VFS in Rust that serves cold queries directly from S3 with sub-second performance, and often much faster.

It’s called turbolite.

It is experimental, buggy, and may corrupt data.

I would not trust it with anything important yet.

I wanted to explore whether object storage has gotten fast enough to support embedded databases over cloud storage.

Filesystems reward tiny random reads and in-place mutation.

S3 rewards fewer requests, bigger transfers, immutable objects, and aggressively parallel operations where bandwidth is often the real constraint.

This was explicitly inspired by turbopuffer’s ground-up S3-native design. https://turbopuffer.com/blog/turbopuffer The use case I had in mind is lots of mostly-cold SQLite databases (database-per-tenant, database-per-session, or database-per-user architectures) where keeping a separate attached volume for inactive database feels wasteful. turbolite assumes a single write source and is aimed much more at “many databases with bursty cold reads” than “one hot database.” Instead of doing naive page-at-a-time reads from a raw SQLite file, turbolite introspects SQLite B-trees, stores related pages together in compressed page groups, and keeps a manifest that is the source of truth for where every page lives.

Cache misses use seekable zstd frames and S3 range GETs for search queries, so fetching one needed page does not require downloading an entire object.

At query time, turbolite can also pass storage operations from the query plan down to the VFS to frontrun downloads for indexes and large scans in the order they will be accessed.

You can tune how aggressively turbolite prefetches.

For point queries and small joins, it can stay conservative and avoid prefetching whole tables.

For scans, it can get much more aggressive.

It also groups pages by page type in S3.

Interior B-tree pages are bundled separately and loaded eagerly.

Index pages prefetch aggressively.

Data pages are stored by table.

The goal is to make cold point queries and joins decent, while making scans less awful than naive remote paging would.

On a 1M-row / 1.5GB benchmark on EC2 + S3 Express, I’m seeing results like sub-100ms cold point lookups, sub-200ms cold 5-join profile queries, and sub-600ms scans from an empty cache with a 1.5GB database.

It’s somewhat slower on normal S3/Tigris.

Current limitations are pretty straightforward: it’s single-writer only, and it is still very much a systems experiment rather than prod

原文链接: HackerNews

1 min · 384w
Home
Browse next
Keep exploring from this story
View this source View this language on the homepage Search related topics

More in this language

On Taiwan, Trump reimagines strategic ambiguity to suit his own ends
南华早报 · 2026-03-26
Gold Lower on Mixed Signals Over Potential Iran Ceasefire
Bloomberg · 2026-03-26
AWS would prefer to forget March ever happened in its UAE region
The Register · 2026-03-26
Trump says U.S. to hold off for 10 days on hitting Iran energy sites
Washington Post · 2026-03-26
Mortgage Rates Jump to 6.38% as War Rattles Housing Market
Bloomberg · 2026-03-26

More from this source

Show HN: Claude skill that evaluates B2B vendors by talking to their AI agents
English · 2026-03-26
Show HN: Agent Skill Harbor – a GitHub-native skill platform for teams
English · 2026-03-26
Show HN: Robust LLM Extractor for Websites in TypeScript
English · 2026-03-26
Show HN: Automate your workflow in plain English
English · 2026-03-25
Show HN: Optio – Orchestrate AI coding agents in K8s to go from ticket to PR
English · 2026-03-25

Recently read