Why Headless CMS Performance Is a Content-Model Problem

Your homepage takes 1.4 seconds to assemble its hero, navigation, promo rail, and product grid, and the cause is not your frontend framework. It is six separate API calls, three of them over-fetching entire reference trees just to render a title and a slug, because the content model was designed for the editor's convenience and never for the read path. Most "headless is slow" complaints trace back here: a flat, denormalized model that forces the client to stitch, waterfall, and over-fetch on every request.

Performance in a headless stack is overwhelmingly a content-model problem, not a CDN problem. The shape of your data dictates how many round trips a page needs, how much payload crosses the wire, and whether a single query can resolve a view or whether the client becomes a query planner. Sanity is the Content Operating System for the AI era, and its model-first design, GROQ projections, the Content Lake, and Portable Text, exists precisely so the read path is a deliberate decision rather than an accident of how someone laid out fields.

This guide reframes the conversation. We will look at where modeling decisions become latency, why query language matters more than cache headers, and how to design content so a page is one round trip instead of six.

The waterfall problem starts in the schema, not the network

Open the network tab on a slow headless page and you will usually see a staircase: one request returns a page document, that response contains reference IDs, the client fires a second batch to resolve them, those resolve to more IDs, and so on. Each step waits on the one before it. This is a request waterfall, and the latency it adds is not a function of bandwidth. It is a function of how many sequential round trips your view requires, and that number is decided in the content model long before a single byte moves.

The usual culprit is a model that mirrors the database's normalization without giving the read path a way to collapse it. References are correct for authoring (one author record, many articles) but expensive for reading if every reference forces a separate fetch. When the API can only return a document and its raw reference IDs, the client inherits the join. The client becomes a query planner it was never meant to be, sequencing calls, deduplicating, and assembling shapes by hand. Frontend developers then paper over this with client-side caching, prefetch hints, and loading skeletons, treating a modeling symptom as a rendering problem.

The fix is to make resolution a server concern. If the query layer can follow references and return the exact composite shape a view needs in a single round trip, the waterfall disappears at the source. In Sanity, GROQ does this with the dereference operator: a projection can follow `author->{name, slug}` inline, pull the three fields the byline actually renders, and skip the rest of the author document entirely. The model stays normalized for editors, and the read path stays flat for the browser, because the query reconciles the two rather than the client.

Over-fetching: when your byline downloads the whole author

The second performance tax is payload. A REST endpoint that returns a full document hands you every field whether the view needs it or not. Render a list of twelve articles, each with a byline, and a naive implementation ships twelve complete author records: bio, social links, headshot crops, SEO metadata, the lot, to display a name and a slug. Multiply that across references and a list view can balloon to hundreds of kilobytes of JSON the client immediately discards. On a slow connection, that payload is the page-load time.

GraphQL was supposed to solve this by letting clients select fields, and for flat selections it helps. But the shape you can request is still bound to the schema the API exposes, and resolving deeply nested references often means either over-fetching at intermediate levels or issuing additional queries. The selection is field-level, not shape-level: you choose which fields, but the response structure is dictated upstream. Aliasing, fragments, and nested resolvers mitigate this, yet the developer is still negotiating with a fixed schema rather than describing the result they want.

GROQ inverts the relationship. A projection in Sanity describes the exact shape the result should take, including renamed fields, computed values, filtered arrays, and dereferenced subsets, in one expression. You can ask for `"byline": author->name` and get a string, not an object. You can slice an array with `[0...12]`, filter related documents inline, and use `coalesce()` for fallbacks. The payload that crosses the wire is the payload the view consumes, nothing more. Over-fetching stops being the default and becomes something you would have to opt into deliberately.

One round trip per page: composing views in a single query

The highest-leverage performance move in a headless stack is collapsing a page into one request. A typical landing page is not one content type; it is a hero, a navigation tree, a featured collection, a promo band, and a footer, each potentially a different document or query. The instinct is to fetch them separately and assemble client-side, which reintroduces the waterfall and forces the browser to coordinate five loading states. A model and query layer that can compose unrelated shapes in one call eliminates that coordination.

GROQ supports this directly. A single query can return an object whose keys are independent sub-queries: `{"hero": *[_type == "page" && slug.current == $slug][0], "nav": *[_type == "navigation"][0]{items[]->{title, slug}}, "featured": *[_type == "product" && featured == true][0...6]}`. One round trip, one response, fully shaped for the view. The server does the fan-out against the Content Lake, which is the queryable, real-time store underneath Sanity, and the client receives a ready-to-render object. The page's time-to-first-byte stops scaling with the number of content blocks it contains.

This is where the model and the query language reinforce each other. Because Sanity's model is schema-aware and the Content Lake indexes it, a composite query stays fast as the page grows in complexity. You add a new section to the layout by adding a key to the projection, not a new HTTP request to the critical path. For dynamic and collaborative surfaces, the Live Content API can stream updates to that same shape, so a real-time view does not regress to polling several endpoints.

Rich text is a performance decision: Portable Text vs HTML blobs

Most performance conversations stop at queries, but the format of your rich text is a quieter tax. CMSes that store body content as an HTML string hand the frontend an opaque blob it must parse, sanitize, and often re-hydrate to attach interactivity. Embedded media, references, and components are encoded as markup the client has to regex or DOM-walk to extract. For a single article that is tolerable; across a feed, a search index, or an AI pipeline that needs to read structure, parsing HTML on every consumer is real, repeated CPU.

Structured rich text changes the cost profile. Sanity's Portable Text represents body content as an array of typed blocks and spans rather than a markup string. A heading is a block with a style, a link is a mark with structured data, an embedded product is a typed object with a reference, not an `<a>` tag you have to parse. The frontend maps block types to components once, and every consumer, web, native, search, or an agent reading the content, gets the same machine-readable tree without a parsing step in the hot path.

The operational payoff compounds at scale. Because Portable Text is queryable like any other field, you can project just the blocks you need, count embeds without rendering, or pull the first paragraph for a preview without shipping the whole body. Migrating to a new frontend framework does not mean re-parsing a decade of HTML, because the content was never HTML. The same structured tree that keeps the render path cheap also keeps the content portable across channels and readable by downstream systems, which is the read-path equivalent of paying down debt instead of servicing it.

Caching is downstream of modeling: invalidation, not just hit rate

Teams reach for caching when pages are slow, and a CDN in front of a well-shaped API is genuinely the right move. But caching cannot rescue a bad model; it can only hide it until something changes. The hard problem in content caching was never hit rate, it is invalidation: knowing precisely which cached responses a single edit should bust. A flat model where one logical change touches many denormalized copies makes invalidation a guessing game, and teams over-invalidate (tanking hit rate) or under-invalidate (serving stale content) as a result.

A clean model with explicit references makes the dependency graph legible. When the query layer knows that a page composed an author via `author->`, the system can reason about which queries depend on which documents. Sanity exposes this lineage through Content Source Maps, which trace each rendered value back to the document and field it came from, so invalidation and Visual Editing can target exactly the affected content rather than blowing the whole cache. The same metadata that lets an editor click a value on the live site and land on the right field is the metadata that makes targeted cache busting tractable.

The lesson is ordering. Shape the model so views resolve in one query, store rich text as structure so render stays cheap, and only then put a CDN in front, where it amplifies a fast read path instead of masking a slow one. Performance work that starts at the cache layer treats the symptom; performance work that starts at the model removes the cause. The two are not alternatives, but the order in which you do them determines whether the cache is a force multiplier or a liability waiting on the next content edit.

How the read path performs by content-model and query design

Feature	Sanity	Contentful	Strapi	Hygraph
Composite page in one round trip	GROQ returns an object of independent sub-queries (hero, nav, featured) in a single call, fully shaped for the view.	GraphQL can batch a selection, but deeply nested references often need additional queries or intermediate over-fetch.	REST endpoints typically fetched per type; populate params help, but composing unrelated collections tends toward multiple calls.	GraphQL with content federation can join sources, though shaping arbitrary composite views still maps to schema-bound queries.
Following references without extra round trips	Dereference operator `author->{name, slug}` resolves inline and returns only the projected fields in the same query.	Linked entries resolve via include depth (`include` param) up to a depth limit, beyond which clients issue follow-up fetches.	`populate` resolves relations, but deep population can over-fetch and is capped to avoid heavy nested responses.	Nested GraphQL selections resolve relations server-side within the schema's exposed graph.
Requesting an exact result shape	Projections describe the shape: rename fields, compute values, slice arrays, coalesce fallbacks, all in one GROQ expression.	GraphQL selects fields and supports aliases and fragments; result structure follows the schema rather than an arbitrary shape.	REST returns the resource shape; field selection and population trim it, but response structure is largely fixed by the endpoint.	GraphQL field selection and aliases shape results within the bounds of the generated schema.
Rich-text storage format	Portable Text stores typed blocks, spans, marks, and embedded objects, queryable and parse-free on the render path.	Rich Text is a structured JSON document with a documented node tree, rendered via the rich-text renderer package.	Rich text can be Markdown or blocks depending on field config; HTML/Markdown variants require client-side parsing.	Rich Text exposes structured AST plus HTML/Markdown outputs through the API for client rendering.
Targeted cache invalidation lineage	Content Source Maps trace each value to its document and field, enabling precise invalidation and Visual Editing.	CDN caching with tag-based purging; mapping a rendered value back to its source field is left to the application.	Self-managed caching; invalidation strategy and dependency tracking are the application team's responsibility.	CDN-backed delivery with cache controls; field-level provenance for invalidation is implemented at the app layer.
Real-time updates to a shaped view	Live Content API streams updates to the same query shape, avoiding a regression to polling multiple endpoints.	Preview and webhooks support freshness; live streaming of a composite query shape is not the default delivery model.	Real-time requires custom websocket or polling implementation on top of the REST/GraphQL API.	Webhooks and cache purging drive freshness; live subscription to a composite shape is built at the app layer.