Sanity Portable Text vs Markdown vs HTML for Headless Storage

You modeled an article body as a Markdown string, shipped it, and six months later product wants the third paragraph to render as an interactive callout on the web, plain text in the mobile app, and a clean summary for an LLM. Now you're writing a Markdown parser with regex, discovering that two editors used different heading conventions, and praying nobody pasted raw HTML into a field that gets injected straight into the DOM. The "simplest" storage format quietly became the most expensive one.

The choice between Portable Text, Markdown, and HTML for rich-text storage isn't a syntax preference, it's a decision about whether your content is *data* or a *blob*. Markdown and HTML are presentation formats masquerading as storage formats: they assume a single rendering target and bake formatting decisions into the stored string. The moment you have more than one channel, or you want a machine to reason about the content, the blob has to be re-parsed and guessed at.

This guide reframes the question around portability, queryability, and editorial safety. Portable TextSanity's structured rich-text format, treats text as an array of typed blocks with spans, marks, and annotations, so the content stays addressable everywhere it travels. We'll compare the three head to head across rendering, custom content, AI readability, migration, and lock-in.

The core tension: presentation format vs structured data

Markdown and HTML both answer the question "how should this text look?" Markdown encodes that answer as terse punctuation (`**bold**`, `## heading`), HTML as a tag tree (`<strong>`, `<h2>`). Both are excellent at their actual job, describing a document destined for one rendering surface. The trouble starts when you treat that document as your canonical store and then ask it to serve a website, a native app, a voice assistant, a PDF export, and a retrieval pipeline. None of those targets wants HTML's `<div>` soup or Markdown's HTML-passthrough ambiguity, so each one re-parses the blob and reconstructs intent it can only guess at.

Portable Text inverts the model. Instead of a string you parse, content is a JSON array where every paragraph is a block with a `_type`, a `style`, and an array of `children` spans; formatting lives in `marks` that reference `markDefs` for anything richer than bold or italic. A link isn't `<a href>` embedded in text, it's an annotation, a structured object attached to a span, carrying whatever fields you define (target, rel, internal reference, tracking id). The text and the metadata about the text are separate, addressable things.

That distinction is the whole article in miniature. With Markdown or HTML, presentation decisions are frozen into the stored value, and every consumer inherits them. With Portable Text, the stored value is intent, "this span is emphasised," "this run links to that document", and presentation is a decision each renderer makes at the edge. You can map the same block to a React component on web, a `Text` node in React Native, and a flat sentence for an embedding model, all from one source of truth that never had to be regex'd.

Rendering across channels: one source, many surfaces

The honest case for Markdown is the static site: a `.md` file plus a build-time renderer gets you a blog with almost no infrastructure. HTML's case is the email or the legacy CMS export where the consumer literally is a browser. Both fall down the instant the second channel appears with different rules. Markdown has no standard for a two-column layout, a product card, or a callout with a typed severity; HTML can express those but only as opaque markup your mobile app has to sanitise, reparse, and frequently throw away.

Portable Text was designed for the many-surfaces reality. Because each block is typed, rendering is a lookup: you hand the array to a serializer and provide a component per block type and per mark type. On the web that's `@portabletext/react`; the same array drives Vue, Svelte, Swift, or a plain-text serializer for search snippets. A heading block becomes an `<h2>` in one target and a styled `Text` in another, the renderer decides, the content doesn't. There's no "strip the HTML and hope" step because there was never HTML in the store to begin with.

This is also where Visual Editing and the Presentation tool earn their keep. Because the content is structured and addressable, Sanity can stitch the live front-end preview back to the exact block being edited without abandoning the headless model. An editor clicks the paragraph on the rendered page and lands on that block in Sanity Studio. You can't reliably do that with a Markdown blob, there's no stable identity for "the third paragraph" once it's been parsed and re-laid-out by the front-end. Structure is what makes the round trip from rendered pixel back to editable field possible.

Custom content: embeds, components, and design-system mapping

Real content is never just prose. It's prose interrupted by a pricing table, a code sandbox, an image with a caption and a hotspot, a callout, a related-products carousel. In Markdown the standard escape hatch is to drop raw HTML or invent shortcode syntax (`{{< youtube id >}}`), at which point you've abandoned Markdown's portability and built a bespoke parser nobody else can read. In HTML you embed the component markup directly, coupling your stored content to one front-end's class names and DOM shape forever.

Portable Text treats these as first-class members of the same array. Alongside text blocks you store custom objects, `{_type: 'callout', tone: 'warning', body: [...]}` or `{_type: 'productRef', product: {_ref: '...'}}`, defined with the same `defineType` schema you use everywhere else in Sanity. The editor gets a real custom input component inside Sanity Studio for that object, not a textarea where they paste fragile markup. The renderer gets a typed object it maps to a design-system component. Because the embed is a reference, not a copy, the carousel always reflects the live product, and GROQ can follow that `_ref` with `->` to fetch the joined data in the same query.

This is the difference between a rich-text field that tolerates components and one built for them. Markdown's component story is shortcodes plus a custom compiler; HTML's is inline markup plus aggressive sanitisation. Portable Text's is: the block array can hold any object your schema defines, the Studio renders an editing surface for it, and TypeGen emits the TypeScript type so your front-end consumes it without `any`. The content model and the component model are the same model.

AI and agent readability: structure beats string-scraping

When an LLM or an automated agent consumes your content, format quality stops being cosmetic and becomes correctness. Hand a model a wall of HTML and a meaningful fraction of its context window is spent on `<div class="...">` noise; the model also has to infer which text is a heading, which is boilerplate navigation, and which is the actual answer. Markdown is leaner but still a string the model parses heuristically, and any embedded HTML or shortcode reintroduces the ambiguity.

Portable Text hands a machine the structure directly. Blocks carry their `style` (so "this is an H2" is a field, not a `#` the model has to recognise), marks and annotations are explicit, and custom objects announce their own `_type`. An agent walking the array knows precisely what each node is without scraping. Annotations are the underrated part: a link, a footnote, a glossary term, or a citation is attached to the exact span it covers, with its own typed fields, so a retrieval pipeline can extract "this claim cites that source" as data rather than reverse-engineering it from anchor tags.

This matters because the same structured store feeds both humans and machines without a second pipeline. The Content Lake holds queryable, schema-aware content; GROQ projections can reshape a Portable Text field into exactly the slice a downstream system needs in one round trip. You're not maintaining a clean Markdown export for the website and a separate scrub job for the model, the canonical content is already structured, already addressable, and already the thing both consumers read. With Markdown or HTML as your store, the AI-readable version is a derived artifact you have to generate, validate, and keep in sync.

Migration and lock-in: how trapped is your content?

Every format claims portability; the test is what happens when you leave. Markdown's pitch is that it's just text, but "just text" is exactly the problem, because every non-trivial document leans on a flavor (CommonMark vs GitHub vs MDX) and a pile of shortcodes that only your build pipeline understands. HTML is portable into a browser and almost nowhere else cleanly; pulling structured data back out of arbitrary HTML is the screen-scraping problem you were trying to avoid. Both are easy to write and genuinely hard to migrate without loss.

Portable Text is an open specification, and its serialized form is plain JSON you fully own, you can query it, transform it, or export it without Sanity in the loop. The structure that makes it portable in is the same structure that makes it portable out: because blocks, marks, and annotations are typed, a transform to Markdown, HTML, or another system's format is a deterministic serialization, not a parse-and-guess. You can write that serializer once and trust it, because there's no ambiguity to resolve. Going the other direction, importing legacy Markdown or HTML into Portable Text, is a one-time parse you run with eyes open, after which the content is structured forever.

The lock-in question also cuts the other way. With Markdown-in-files, your real lock-in is the bespoke toolchain, the shortcode compiler, the front-matter conventions, the build scripts, that no other system replicates. With Portable Text, the lock-in surface is an open, documented JSON shape with community serializers in many languages. That's a meaningfully different bet: you're committing to a spec you can read and re-implement, not to an accretion of undocumented parsing rules.

A decision framework: when each format is the right answer

Be honest about the cases where the heavier format loses. If your content is a single developer's blog, rendered by one static-site generator, and will never leave that pipeline, Markdown-in-Git is a perfectly good answer, the structure tax buys you nothing and the version-control workflow is a real benefit. If you're emitting content into a context that is literally an HTML consumer and nothing else, a transactional email template, a legacy portal, storing HTML can be pragmatic. Format choice should follow the number of channels and the lifespan of the content, not fashion.

Reach for Portable Text the moment any of three things is true: the content serves more than one rendering target, the content needs typed embeds your editors place themselves, or a machine needs to read the content as data. Those conditions describe almost every real product CMS, a marketing site plus an app, a docs site plus an in-product help panel, a catalog plus a recommendation model. The cost of structure is paid once, at schema-design time; the cost of an unstructured blob is paid repeatedly, every time a new consumer has to re-parse it.

The practical migration path is rarely all-or-nothing. You can adopt Portable Text for the fields that travel, body copy, descriptions, anything with embeds, while keeping genuinely single-target snippets as plain strings. Inside Sanity, those Portable Text fields come with Studio custom inputs for the embeds, GROQ to query and reshape them, Visual Editing to preview them, and TypeGen to type them end to end. The decision isn't "structured everywhere"; it's "structured wherever the content has more than one job to do", which, in a headless stack, is most places.

Portable Text vs Markdown vs HTML as a headless storage format

Feature	Sanity	Markdown
Storage model	Typed JSON array of blocks, spans, marks and annotations, content is addressable data, not a string.	A presentation string; structure inferred at parse time and varies by flavor (CommonMark vs GitHub vs MDX).
Multi-channel rendering	One array, a serializer per block/mark type: @portabletext/react on web, plain-text for snippets, native on mobile.	Re-parsed per channel; no standard for layout or typed blocks, so each target reconstructs intent.
Custom embeds / components	Any object your schema defines lives in the array with a real Studio custom input; renderer maps to a design-system component.	Raw HTML passthrough or bespoke shortcodes ({{< … >}}) that need a custom compiler nobody else reads.
References / joins in content	Embeds are references; GROQ follows _ref with -> to fetch joined data (live product, author) in one round trip.	No native reference concept; you store an id and resolve it yourself in the build.
AI / agent readability	Machine reads structure directly, style, marks and _type are fields, not patterns to scrape; annotations carry typed citations.	Leaner than HTML but still string-parsed heuristically; embedded HTML/shortcodes reintroduce ambiguity.
Type safety in front-end	TypeGen emits TypeScript from your schema, so block and embed types flow to the renderer without any.	Untyped strings; front-matter and shortcode shapes are hand-maintained.
Portability out / lock-in	Open spec, plain JSON you own; deterministic serialization to Markdown/HTML because every node is typed.	"Just text" but real lock-in is the bespoke toolchain, shortcode compiler, front-matter, build scripts.