Who is Daniel Astudillo?

Daniel Astudillo is a software engineer based in New York City. He currently works at S&P Global building data platforms and full-stack applications, and previously built payment and benefit systems at Visa.

What technologies does Daniel Astudillo work with?

Daniel works across the stack with React, TypeScript, and Next.js on the frontend, and .NET Core, C#, Spring Boot, and Java on the backend. He has deep experience with PostgreSQL, BigQuery, gRPC, and distributed messaging systems.

Where is Daniel Astudillo based?

Daniel is based in New York City, where he works as a software engineer at S&P Global.

What has Daniel Astudillo built?

Daniel has taken a data API from a 21-second worst case to roughly 200–300ms (Storage Write API, then PostgreSQL), built a real-time event pipeline processing 100K+ daily events at 99.99% uptime, and modernized Visa payment and eligibility APIs at production scale (including paths above 20M requests per month). He writes about this work on his blog.

What is Daniel Astudillo's educational background?

Daniel graduated from Williams College with a Bachelor of Arts in Computer Science and Mathematics.

March 2026Updated June 20269 min readStart here

Building a collaborative editor with CRDTs

Real-time collaboration is a relay problem, not a referee problem. Yjs CRDTs, an append-only update log, and snapshot + tail replay for fast joins.

Real-time
CRDT
Yjs
Node.js
PostgreSQL

Two cursors in the same buffer is where “just broadcast the string” stops working. I rebuilt a Monaco-based editor around Yjs so concurrent keystrokes converge without a server-side merge referee.

#TL;DR

I built a multi-user code editor (Monaco + React) where concurrent edits must converge without a central "winner." The stack:

Yjs for conflict-free replicated text (Y.Text + y-monaco binding)
Socket.IO relay — binary yjs-update frames, not full-document overwrites
Neon Postgres append-only document_updates log with monotonic sequence numbers
Object storage snapshots (default: every 50 updates or 30 seconds) for fast reconnects
Clerk JWT verified on both HTTP and the WebSocket handshake

The first implementation was last-write-wins. It worked in demos and failed under two cursors. The production-shaped version trades storage complexity for correctness you don't have to think about.

#Why not "send the whole document"?

Client A

full string overwrite

Relay server

last write wins

Client B

lost / duplicated chars

Two users type at once and you get lost characters, duplicated text, or forked state. The server becomes a referee: whoever writes last wins, and "last" depends on network timing.

Operational Transformation (OT) can solve this — Google Docs famously uses OT variants — but OT requires transforming each operation against concurrent ops you haven't seen, often with a central ordering authority. It's powerful and notoriously easy to get subtly wrong.

CRDTs (Conflict-free Replicated Data Types) take a different bet: design the data so concurrent operations commute. Apply them in any order on any replica; everyone converge to the same document. The server stops adjudicating and starts relaying.

#Yjs: CRDTs without implementing CRDTs from scratch

I used Yjs rather than hand-rolling a text CRDT. Yjs provides:

A replicated Y.Doc with shared types (Y.Text, maps, arrays)
Binary update encoding — compact deltas via lib0 varints, not JSON patches over full strings
State vector — per-client clock; encodeStateAsUpdate(doc, remoteStateVector) sends only missing ops
y-monaco — binds Monaco's model to Y.Text so keystrokes become CRDT ops automatically
y-protocols awareness — cursor/selection presence as ephemeral metadata

The mental shift: stop thinking in buffer indices. Indices shift on every edit. CRDT text assigns each insert a stable, totally ordered ID (Lamport-style clock + client id). Concurrent inserts at the "same" spot get distinct IDs; both survive; every replica merges to the same order.

The server never parses character positions. It persists and forwards opaque binary updates.

Incremental sync on reconnect:

TypeScript

const stateVector = Y.encodeStateVector(localDoc);
const diff = Y.encodeStateAsUpdate(remoteDoc, stateVector);
// diff is only bytes the remote hasn't seen

#Architecture

Client

React · Monaco · y-monaco

Node relay

JWT · append log · fan-out

Postgres

append-only log

Object storage

periodic snapshots

#Join flow: snapshot + tail replay

When a client joins a document, the server sends doc-init:

Latest snapshot bytes from object storage (if any), plus snapshotSeq
All document_updates with seq > snapshotSeq — the tail since that snapshot

SQL

-- append (transactional)
INSERT INTO document_updates (document_id, seq, update_bytes)
SELECT $1, COALESCE(MAX(seq), 0) + 1, $2
FROM document_updates
WHERE document_id = $1
FOR UPDATE;
 
-- tail fetch
SELECT seq, update_bytes FROM document_updates
WHERE document_id = $1 AND seq > $2
ORDER BY seq ASC;

A new client rebuilds state by applying snapshot + tail — not replaying from seq 0 every time. If snapshot download fails, the server falls back to full replay rather than sending broken partial state.

This is event-sourcing 101: compact checkpoint + log tail.

#Edit flow: relay, don't rewrite

On yjs-update:

TypeScript

const seq = await appendUpdate({ documentId, updateBytes: update });
Y.applyUpdate(inMemoryCache.doc, Buffer.from(update), "remote");
socket.to(documentId).emit("yjs-update", { documentId, seq, update });
maybeWriteSnapshot(inMemoryCache); // every N updates or T seconds

Critical details:

Persist before fan-out — the log is source of truth; reconnects depend on it.
In-memory Y.Doc cache — snapshotting reads from cache, not by replaying the entire log on every edit.
Binary payloads — a keystroke is tens of bytes, not a full file JSON.
FOR UPDATE on seq allocation — prevents duplicate sequence numbers under concurrent writers to same doc.

The codebase still contains the previous implementation, commented in place: it replaced the entire Y.Text contents and broadcast document-updated with the whole string. That comment block is the best documentation of why the change mattered.

#Presence is ephemeral; document state is not

Awareness (remote cursors, selections) rides on awareness-update events via the Yjs awareness protocol. It is not persisted — when you join, the server asks existing clients to awareness-request so you see their cursors immediately.

Document content and chat history are persisted (Postgres). Separating ephemeral presence from durable state keeps the storage model honest.

Data	Durability	Transport
`Y.Text` updates	Postgres append log + snapshots	`yjs-update`
Awareness (cursors)	None (in-memory only)	`awareness-update`
Chat messages	Postgres rows	REST + WS notify

#Durability vs latency

Concern	Approach
Local responsiveness	Apply edits to local `Y.Doc` immediately (optimistic); network confirms propagation
Durability	Append every update to Postgres in a transaction with row lock on seq
Join latency	Snapshots to object storage; replay only the tail
Server restart	Rebuild in-memory cache from snapshot + tail on first access

Snapshot policy (configurable via env):

SNAPSHOT_EVERY_N_UPDATES — default 50
SNAPSHOT_EVERY_MS — default 30_000

After successful snapshot write at snapshotSeq, optional prune:

SQL

DELETE FROM document_updates
WHERE document_id = $1 AND seq <= $2;

Trades storage for simpler replay bounds — only safe after snapshot object is verified readable.

#Everything else a "portfolio editor" still needs

Real-time text sync is the core, but production-shaped software has boundaries:

Auth — Clerk session JWT verified on REST and Socket.IO handshake; document membership enforced in Postgres (document_members roles: owner / editor / viewer).
Sharing — people picker (Clerk Backend getUserList when configured, Postgres fallback), share links, access requests, notification inbox.
Rate limiting — Redis sliding-window limiters (chat, AI, code execution, search) — atomic Lua scripts, fail-closed on sensitive paths.
Deploy — frontend on Vercel; WebSocket server on Fly.io with min_machines_running = 1 so live sessions aren't torn down on scale-to-zero.

The editor is a learning project that grew up — not a Google Docs competitor, but intentionally built with the same class of problems: concurrency, persistence, auth, and ops.

#What I'd tell my past self

Ship the naive version first — you'll feel why CRDTs exist.
Don't benchmark vibes — track update processing time server-side if you want numbers; design for local-first UX regardless.
The server is a relay + log, not a source of truth for merge logic. If you're implementing OT/CRDT merge rules in Node, you're probably fighting your library.
Snapshot + tail matters the day someone opens a 10MB document with 50k edits. CRDT correctness without join performance still feels broken.

CRDTs feel like overkill until two cursors collide and nothing breaks. After that, they feel like the only sane default for live editing.

#Append-only updates under row lock

Each Yjs binary frame is persisted with a monotonic sequence inside a Postgres transaction. The server locks document_state with FOR UPDATE, increments latest_update_seq, and inserts into document_updates — fan-out happens only after commit:

JavaScript

await client.query('begin;');
const stateRes = await client.query(
  `select latest_update_seq from document_state where document_id = $1 for update;`,
  [documentId]
);
const nextSeq = BigInt(stateRes.rows[0].latest_update_seq) + 1n;
await client.query(
  `insert into document_updates (document_id, seq, actor_user_id, update) values ($1, $2, $3, $4);`,
  [documentId, nextSeq.toString(), actorUserId || null, Buffer.from(updateBytes)]
);
await client.query('commit;');

That ordering guarantees tail replay is deterministic even when two editors type at once.

#Snapshot cadence and join performance

Snapshots upload to R2 when either threshold fires — env-tunable without redeploying client code:

Env var	Default	Role
`SNAPSHOT_EVERY_N_UPDATES`	50	Count-based flush
`SNAPSHOT_EVERY_MS`	30_000	Time-based flush for idle docs
`PRUNE_UPDATES_BEFORE_SNAPSHOT`	false	Optional tail trim after snapshot

On join, the server sends the latest snapshot bytes plus updates after snapshotSeq. If R2 download fails but a snapshot pointer exists, it falls back to full replay rather than handing the client a broken partial state — a failure mode that showed up during local dev without object storage configured.

#Closing thought

Correct merge logic still feels broken if join takes ten seconds—snapshot cadence and tail replay are user-facing performance, not storage trivia.

#Reader field guide

When CRDT relay + log beats OT or last-write-wins: Two or more people edit the same buffer at once; reconnects must merge without a central ordering authority; you can accept append-only growth with periodic snapshots.

Yjs / sync failure modes to plan for

Snapshot object missing or corrupt — fall back to full replay; never ship partial snapshot bytes.
Broadcast before Postgres commit — reconnecting clients miss ops; persist, then fan-out.
Duplicate seq under concurrent writers — allocate sequence inside FOR UPDATE; otherwise tail order is nondeterministic.
Empty in-memory Y.Doc after restart — rebuild from snapshot + tail on first access, not from a blank doc.
Skipping state-vector diff on reconnect — client replays entire history; use encodeStateAsUpdate(doc, remoteStateVector) for incremental catch-up.
Treating awareness as durable — cursors disappear on refresh by design; do not write awareness rows to Postgres.
Mixing string overwrites with binary Yjs frames — keep transport opaque binary end-to-end.

Operational checklist

IdP verified on both HTTP and WebSocket handshake
Snapshot thresholds tuned for your largest expected document
Object storage scoped to snapshot prefix; verify read after write before optional tail prune
WebSocket host keeps at least one warm instance if scale-to-zero kills live sessions

#On this site

Post	Why
Building a Gemini AI backend with SSE	Server → client streaming — different problem than CRDT merge
Building a browser music visualizer with Goose	Browser real-time loops (`requestAnimationFrame`) without shared state
Lessons from building a mobile events social platform	Firebase real-time — last-write-wins unless you invest in CRDTs

#References (curated)

Yjs is the implementation; the CRDT survey paper is the “why merge order stops mattering” argument I give skeptical backend folks.

Reference	Notes
Yjs documentation	Updates, snapshots, persistence hooks—read before you design your own op log.
y-monaco	Binding Monaco’s model to a `Y.Text` without fighting the editor’s internal undo stack.
Socket.IO rooms	Document-scoped fan-out—binary Yjs frames, not JSON chat payloads.
A comprehensive study of CRDTs (Shapiro et al.)	Formal grounding for “apply in any order, converge always.”