Offline-First + Sync Pattern
How to design apps that work without network connectivity and reconcile state when connectivity returns.
Offline-First + Sync Pattern
How to design apps that work without network connectivity and reconcile state when connectivity returns.
TL;DR (human)
Offline-first means the local copy is the truth; sync to remote when possible. Designed right: instant interactions, work-anywhere, automatic reconciliation. Designed wrong: lost data, conflicting state, user confusion. Three problems to solve: local persistence, sync protocol, conflict resolution. CRDTs simplify the last; explicit policy works otherwise.
For agents
Three concerns
| Concern | Question | Tools |
|---|---|---|
| Local persistence | Where does data live offline? | IndexedDB (web), SQLite (mobile), file system (desktop) |
| Sync protocol | How do client + server reconcile? | Custom REST + diff, GraphQL subscriptions, CouchDB-style replication |
| Conflict resolution | When local and remote disagree, who wins? | CRDT auto-merge, last-writer-wins, manual resolution UX |
Local persistence
Web: IndexedDB (browser-native; large quotas; structured) > localStorage (small; sync). Wrappers: Dexie, idb, RxDB.
Mobile: SQLite (cross-platform; mature; queryable) > AsyncStorage (key-value only). Wrappers: WatermelonDB, Realm.
Desktop: SQLite or file-based; per-platform OS-native options.
The local store mirrors the server's data shape. Reads are local; writes are local-first (then sync).
Sync protocol
Sync protocol is the contract between client + server. Three styles:
Full replication: client downloads everything for its scope. Re-sync == replace. Works for small datasets per user.
Incremental sync:
- Client tracks last-sync timestamp / version.
- Server returns changes since.
- Client applies; updates last-sync.
Requires server-side change log (per event-streaming-pattern.md) or efficient timestamp queries.
Operational transform / CRDT replication: each side captures operations or CRDT updates; merge convergent regardless of order.
Operation log on client
Local writes captured as operations (not just state mutations):
type LocalOp = {
id: string;
type: "create" | "update" | "delete";
entity: string;
payload: unknown;
clientTimestamp: number;
status: "pending" | "synced" | "failed";
};Operations sit in a local queue. When online: replay in order to server; mark synced. Server returns canonical-state diffs.
Conflict resolution strategies
| Strategy | When |
|---|---|
| Last-writer-wins (LWW) | Strict total order via timestamps; data class allows loss |
| First-writer-wins | "Once set, immutable" semantics |
| CRDT | Math guarantees convergence regardless of order |
| Manual | UI shows conflict; user picks |
| Custom semantic | Domain-specific merge |
CRDT is the cleanest if your data shape fits (counters, text editors, lists, sets). Outside those shapes: LWW + careful UX.
Clock discipline
Conflict resolution often uses timestamps. Clock skew breaks it:
- Use server time when sync happens (server stamps).
- Hybrid Logical Clocks (HLC) for distributed correctness without strict NTP.
- Lamport / vector clocks for strict ordering.
Don't use client wall-clock alone — devices drift, users adjust manually.
Sync state machine per record
local-only → syncing → synced
local-only → syncing → failed → retry
synced → modified-locally → syncing → ...Each record's status determines UI behaviour:
local-only: badge "Saving locally".syncing: subtle activity indicator.synced: clean.failed: error state with retry.
Offline UX
Tell the user:
- Network status: visible (header banner when offline).
- Sync progress: subtle (post-success), explicit (when active sync is non-trivial).
- Per-record state: badges or icons for in-progress / failed.
- Conflicts (rare): explicit UX to resolve.
Never silently lose data. If a sync fails permanently, surface it.
Authentication offline
Tricky:
- Tokens issued before going offline still work locally.
- Refresh fails offline; access token may expire mid-offline session.
- Permission changes on server don't reach client until online.
Pattern: cached permissions; long-lived offline access token; re-auth when online if expired during offline window. Step-up operations (per ../security/session-mgmt-pattern.md) require online.
Service workers (web)
For PWAs:
- Service worker intercepts requests.
- Cache strategies: cache-first (static assets), network-first (API), stale-while-revalidate.
- Background sync API: queue writes; replay when online.
Service workers add complexity; only adopt when offline is a core product requirement.
Sync at scale
For each tenant in multi-tenant systems:
- Sync state per (user, device).
- Sync windows: tenants don't see other tenants' streams.
- Large dataset users may need progressive sync (paginated by time / entity).
Local search
Local data enables local search:
- IndexedDB indexes for fast lookup.
- SQLite FTS5 for full-text search.
- Bloom filters for membership tests in larger datasets.
Search latency dominates UX; local search makes the app feel instant.
When NOT offline-first
- Single-session web tools.
- Sensitive data that shouldn't sit on client devices.
- Truly real-time-only (live video).
- Trivial CRUD where online assumption is acceptable.
Offline-first is a significant architecture commitment. Adopt when network unreliability is part of the product reality.
Common failure modes
- Local data + no sync UI: user wonders if their work persisted. → Per-record status.
- LWW without HLC: clock skew = wrong winner. → HLC or server stamps.
- No conflict UX for high-stakes data: silent loss. → Manual resolution; never silently overwrite.
- No retry policy: failed sync stays failed. → Exponential backoff; resume on connectivity.
- Token expiry during offline: user logged out unexpectedly. → Long offline window; clear re-auth on return.
- All data synced for every user: storage explodes; sync slow. → Scope (tenant; recent; visible).
- No "i lost this" recovery path: data gone if device wiped. → Server is canonical; local is cache; loss recoverable from server if synced.
Tooling stack (typical)
| Concern | Tool |
|---|---|
| Local DB (web) | IndexedDB direct, Dexie, idb, RxDB |
| Local DB (mobile) | SQLite, WatermelonDB, Realm |
| Sync framework | PouchDB / CouchDB, Replicache, Electric SQL, PowerSync |
| CRDT | Yjs (Y.js), Automerge, Loro |
| Operation queue | Custom; or sync framework's built-in |
| Service worker | Workbox |
Adoption path
- Day 0: no offline support. Online-required. Document.
- If offline needed: pick scope (read-only offline first; then writes; then full offline).
- Choose strategy: LWW for simple; CRDT for collaborative; manual UX for high-stakes.
- Per-record sync state in UI.
- Test: throttled / disconnected / re-connect cycles + multi-device.
- Drill: conflict scenarios; data loss recovery.
See also
distributed-data-pattern.md— eventual consistency.event-streaming-pattern.md— operation log + replay.../security/session-mgmt-pattern.md— offline auth concerns.../ui-ux/empty-states-pattern.md— offline as an empty/error state.anti-overengineering.md— adopt offline-first only with real need.
Multi-Region Pattern
How to operate across geographic regions for latency, availability, and data sovereignty — without losing your mind.
Platform Engineering + Internal Developer Platform Pattern
How to build the layer between cloud / infra and product engineers, so product teams ship fast without re-learning infra each time.