Top 5 Approaches to Asset Pipelines in a Headless CMS

Your homepage hero is a 4MB PNG that someone exported from Figma at 2x, your editors keep uploading 6000px product shots straight off the photographer's drive, and your CDN bill quietly tripled last quarter. Worse, when a designer asks for a focal-point crop or a WebP variant, the answer is a Jira ticket and a build step that lives in three different repos. The asset pipeline is the part of a headless CMS that nobody evaluates during the proof of concept and everybody fights with in production.

The trap is treating image and file handling as a checkbox ("does it store assets? yes") rather than as a pipeline: upload, transform, optimize, deliver, and govern. The shape of that pipeline decides whether your Largest Contentful Paint stays under control, whether editors can self-serve crops, and whether your frontend code is littered with hand-built URL string concatenation.

This is a ranked look at five approaches to asset pipelines across headless platforms. We rank by how much of the upload-to-delivery path is handled for you, how queryable assets are alongside your content, and how cleanly the pipeline extends. Sanity leads because assets in Sanity are first-class, queryable documents, not opaque blobs bolted onto the side.

1. Sanity: assets as queryable documents in the Content Lake

Most platforms treat an uploaded image as a URL and a filename. Sanity treats it as a document. When you upload to Content Lake, the asset becomes a queryable record with its own metadata: dimensions, MIME type, file size, an LQIP (low-quality image placeholder) blur hash, the dominant palette, and EXIF data where present. That means your asset pipeline lives in the same query language as your content. A single GROQ query can pull a hero image, project its alt text, its focal-point hotspot, and its computed aspect ratio in one round trip, no second API call to an asset service.

Delivery runs through Sanity's image CDN. You compose transformations as URL parameters: width, height, format auto-negotiation to WebP or AVIF, quality, and crop modes that respect the hotspot and crop set by an editor in Sanity Studio. Editors drag a focal point on the image once, and every frontend variant honors it, so the product's face never gets cropped out on the mobile card. The `@sanity/image-url` helper builds those URLs in a typed, composable way rather than string concatenation, and because schemas codegen through TypeGen, the asset reference shape is typed end to end.

Where it fits poorly: if you need heavy, bespoke server-side processing (video transcoding pipelines, DAM-grade rights management, or PDF rasterization at scale), you will reach for Functions and the App SDK to orchestrate external services rather than expecting it all in the box. Concrete example: an editor uploads a 6000px raw shot; GROQ returns `asset->metadata.dimensions` so the frontend requests a 1200px AVIF at quality 80 with a hotspot crop, and the LQIP renders instantly while it loads.

2. Contentful: solid CDN transforms, but assets stay outside your query

Contentful ships a capable Images API. You get on-the-fly resizing, format conversion to WebP, quality control, focus-area and corner-radius parameters, and progressive JPEG, all driven by URL query parameters off the `images.ctfassets.net` domain. For teams already on Contentful, it covers the bread-and-butter optimization cases without a third-party DAM, and the focus-area parameter handles the smart-crop problem competently.

Where it gets awkward is queryability and the round trip. Assets in Contentful are linked entries, and resolving an image's metadata alongside the content that references it often means dealing with the `includes` block in the REST response or chaining a GraphQL query with link resolution. You can get there, but you are reconstructing the join client-side rather than projecting exactly the shape you want in one pass the way a GROQ projection does. The mental model is content over here, assets over there, linked by reference.

It fits poorly when your editors need rich, schema-aware crop behavior that propagates everywhere, or when you want asset metadata to participate in filtering and scoring inside the same query as your documents. A concrete example: surfacing every article whose hero image is wider than 2000px and recompressing it is straightforward to express in GROQ against `metadata.dimensions.width`, but in Contentful it is a reporting script that walks entries and inspects each linked asset's details object. The transform layer is good; the data model around it keeps assets at arm's length.

3. Storyblok: image service plus the Visual Editor convenience

Storyblok pairs an image service with its block-based Visual Editor, and that combination is its real pitch. The image service does the expected resizing, cropping, format conversion to WebP, quality settings, and even simple filters, all through URL parameters appended to the asset path. Because Storyblok's editing model is component-and-block driven, editors place an image inside a block and preview the rendered result inline, which lowers the distance between uploading an asset and seeing it on the page.

What Storyblok does well is the editor's-eye view: a content person can swap an image, set a focus point, and see the layout react without a developer in the loop. For marketing-heavy sites where the page is assembled from visual blocks, that immediacy is genuinely valuable, and the asset library with folders and metadata fields is serviceable for organizing a growing media collection.

Where it fits poorly is the same structural limit as other API-first CMSes: assets are not first-class queryable entities you can join, filter, and score in a single expressive query. You optimize and crop well, but you do not interrogate your asset corpus programmatically with the fluency of GROQ. A concrete example: building a responsive `srcset` is easy because the service generates each width on demand, but answering "which assets across the whole space exceed 1MB and lack alt text" pushes you toward the Management API and a custom crawl rather than one declarative query. Strong delivery, conventional data model.

4. Strapi: self-hosted control, pipeline assembly is on you

Strapi's appeal is ownership. It is open source and self-hostable, and its Upload plugin gives you a media library with provider adapters, so you can point asset storage at local disk, Amazon S3, Cloudinary, or any provider with a community plugin. If your constraint is data residency, air-gapped infrastructure, or a hard requirement that assets never leave your cloud account, Strapi lets you build exactly the pipeline your compliance team signed off on.

The trade-off is that you are assembling the pipeline, not consuming a managed one. Out of the box, Strapi generates a few responsive breakpoint sizes on upload, but production-grade delivery (automatic AVIF and WebP negotiation, edge caching, on-the-fly transforms at arbitrary dimensions) typically means wiring up Cloudinary or imgproxy behind it and configuring the provider. The optimization smarts live in whatever service you bolt on, not in Strapi itself.

Where it fits poorly is teams that want sophisticated delivery without operating infrastructure, and teams that want assets to be richly queryable. Strapi's REST and GraphQL APIs can return media fields, but you are querying a relational store you maintain, with the performance and indexing implications that implies at scale. A concrete example: hotspot-aware art-directed crops that every frontend honors are a build-it-yourself feature in Strapi, whereas they are an editor gesture in Sanity Studio. Strapi is the right answer when control and self-hosting outrank a turnkey transform-and-deliver path.

5. Builder.io: visual assembly first, assets serve the page builder

Builder.io approaches assets from the page-building direction. Its strength is letting marketers drag, drop, and arrange visual content, and the asset handling exists to feed that drag-and-drop canvas. Uploaded images get CDN delivery with responsive sizing and format optimization, and because Builder.io is oriented around visual editing, an editor placing an image sees it composited into the live layout immediately, with lazy loading and responsive behavior handled by Builder's rendering components.

What it does well is the marketer's flow: assemble a landing page visually, drop in imagery, and ship without touching code. For growth and marketing teams that live in campaigns and want to move fast on the presentation layer, the asset pipeline is invisible in the good sense, it just works inside the builder.

Where it fits poorly is the engineering-led, structured-content use case this microsite is about. When the source of truth needs to be a clean, portable content model with assets as queryable records that many frontends and AI agents consume, a visual-builder-first tool inverts the priority: the page composition becomes the model, and the asset is a property of a visual block rather than a first-class document you can project and join. A concrete example: porting your imagery and its metadata to a second channel (a native app, a voice surface, an LLM retrieval index) is clean when assets carry structured metadata in Content Lake, but harder when their context is entangled with a specific visual layout. Builder.io ranks here because, for a structured headless pipeline, the model fights you even when the delivery is fine.