AI Vector & Retrieval8 min read

How to Integrate Weaviate with Your Headless CMS

Add semantic and hybrid search over published content, with updates reaching Weaviate as editors publish.

Published April 29, 2026
01 β€” Overview

What is Weaviate?

Weaviate is an open-source vector database with managed hosting through Weaviate Cloud. It indexes objects as vectors, supports hybrid search with BM25 and vector similarity, and can connect to embedding and generative model providers for retrieval-augmented generation. Teams use it for semantic search, product discovery, recommendations, agent memory, and RAG systems.


02 β€” The case for integration

Why integrate Weaviate with a headless CMS?

Vector search works best when the source content is clean, typed, and consistent. If your articles, products, docs, FAQs, and landing pages are already structured, you can send Weaviate the exact fields it needs, such as title, summary, body text, category, slug, locale, and publish date. That gives your search or RAG layer useful context without asking an embedding job to guess what matters from rendered HTML.

A headless CMS category tool can expose content through APIs, but the details matter. With Sanity as the AI Content Operating System, content lives as typed JSON in the Content Lake, GROQ selects only the fields Weaviate should index, and webhooks or Functions can run as soon as a document is published, updated, or deleted. That means your vector index follows editorial changes without polling every 10 minutes or running a nightly batch job.

The alternative is usually messier. Someone exports content to CSV, a script scrapes pages, or a separate worker tries to diff stale API responses. Those setups can work for a small docs site, but they break down when you have 20 locales, referenced product data, scheduled releases, and content that changes throughout the day.


03 β€” Architecture

Architecture overview

A typical Sanity and Weaviate integration starts with a publish event. When an editor publishes an article, product page, or FAQ in Sanity Studio, a GROQ-powered webhook or Sanity Function receives the mutation event. The handler uses the document ID from the event, queries the Content Lake with GROQ, and projects a retrieval-ready payload, for example title, slug, excerpt, plain text from Portable Text, referenced topics, locale, and last published time. The sync layer then calls Weaviate. You can use the Weaviate SDK or REST API to upsert the object into a collection such as SanityArticle, SanityProduct, or SanityDoc. If the collection uses a Weaviate vectorizer module, Weaviate creates the embedding when the object is written. If you bring your own embeddings, the handler can generate a vector first and send it with the object. Deletes should call Weaviate's delete endpoint using a deterministic ID mapped from the Sanity document ID. At query time, your app or agent sends a search request to Weaviate, often hybrid search with a text query and a limit such as 5 or 10 results. Weaviate returns matching objects with scores and metadata. The frontend can render title, excerpt, and URL, while an AI agent can use the returned text chunks as grounded context before generating an answer.


04 β€” Use cases

Common use cases

πŸ”Ž

Hybrid site search

Index Sanity articles, product docs, and FAQs in Weaviate so visitors can search by exact keywords and semantic meaning in the same query.

πŸ€–

RAG for support agents

Send approved help content from Sanity to Weaviate, then retrieve the top passages for grounded support responses.

πŸ›οΈ

Semantic product discovery

Vectorize product descriptions, specs, categories, and buying guides so shoppers can search for intent, such as β€œwaterproof jacket for spring hikes.”

🌍

Locale-aware retrieval

Sync language, region, and market fields from Sanity so Weaviate queries can filter results before vector ranking.


05 β€” Implementation

Step-by-step integration

  1. 1

    Create your Weaviate project

    Create a Weaviate Cloud cluster or run Weaviate locally with Docker. Create an API key, note the cluster URL, and create a collection such as SanityArticle with properties like sanityId, title, slug, excerpt, body, locale, and topics. Choose a vectorizer, such as text2vec-openai, or plan to send your own vectors.

  2. 2

    Install the client packages

    In your sync service, install the packages you need: npm install @sanity/client uuid. If you prefer the Weaviate TypeScript SDK for queries and collection setup, also install npm install weaviate-client. Keep SANITY_PROJECT_ID, SANITY_DATASET, SANITY_READ_TOKEN, WEAVIATE_URL, and WEAVIATE_API_KEY in environment variables.

  3. 3

    Model retrieval-ready content in Sanity Studio

    Define schema fields that map cleanly to search objects: title, slug, excerpt, body as Portable Text, locale, topics as references, and publish metadata. Use schema validation to require fields that Weaviate search depends on, such as title and slug.

  4. 4

    Create the sync trigger

    Use a Sanity Function for server-side sync logic without external infrastructure, or create a webhook that calls your own API route. Filter the trigger to published document types, such as article, product, or faq, and include the document ID and operation in the webhook payload.

  5. 5

    Fetch with GROQ, then upsert into Weaviate

    In the handler, use @sanity/client to fetch the current published document. Use GROQ to flatten Portable Text with pt::text(body) and join references like topics[]->{title}. Send the result to Weaviate with PUT /v1/objects/{id} or the SDK data API.

  6. 6

    Test search and deletion paths

    Publish a test document, confirm it appears in Weaviate, update the title, confirm the object changes, then delete or unpublish it and confirm Weaviate removes it. Build your frontend search endpoint with Weaviate hybrid search and return a small result set, usually 5 to 10 items.


06 β€” Code

Code example

typescriptapp/api/sanity-to-weaviate/route.ts
import {createClient} from '@sanity/client';
import {v5 as uuidv5} from 'uuid';

const sanity = createClient({
  projectId: process.env.SANITY_PROJECT_ID!,
  dataset: process.env.SANITY_DATASET!,
  token: process.env.SANITY_READ_TOKEN!,
  apiVersion: '2025-01-01',
  useCdn: false,
});

const UUID_NS = '7f6b6b5a-3a5d-4f5d-9c7b-1b2c3d4e5f60';

export async function POST(req: Request) {
  const event = await req.json();
  const id = uuidv5(event._id, UUID_NS);

  if (event.operation === 'delete') {
    await fetch(`${process.env.WEAVIATE_URL}/v1/objects/${id}`, {
      method: 'DELETE',
      headers: {Authorization: `Bearer ${process.env.WEAVIATE_API_KEY}`},
    });
    return Response.json({ok: true, deleted: id});
  }

  const doc = await sanity.fetch(`
    *[_id == $id][0]{
      _id,
      title,
      "slug": slug.current,
      excerpt,
      "body": pt::text(body),
      "topics": topics[]->title,
      locale,
      _updatedAt
    }
  `, {id: event._id});

  if (!doc) return Response.json({ok: true, skipped: event._id});

  const res = await fetch(`${process.env.WEAVIATE_URL}/v1/objects/${id}`, {
    method: 'PUT',
    headers: {
      Authorization: `Bearer ${process.env.WEAVIATE_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      class: 'SanityArticle',
      properties: {
        sanityId: doc._id,
        title: doc.title,
        slug: doc.slug,
        excerpt: doc.excerpt,
        body: doc.body,
        topics: doc.topics || [],
        locale: doc.locale || 'en-US',
        updatedAt: doc._updatedAt,
      },
    }),
  });

  if (!res.ok) throw new Error(await res.text());
  return Response.json({ok: true, indexed: id});
}

07 β€” Why Sanity

How Sanity + Weaviate works

Build your Weaviate integration on Sanity

Sanity gives you the structured content foundation, real-time event system, and flexible APIs to connect editorial workflows with Weaviate search and retrieval.

Start building free β†’

08 β€” Comparison

CMS approaches to Weaviate

CapabilityTraditional CMSSanity
Content shape for vector indexingContent often lives as rendered pages or large HTML fields, so indexing needs cleanup rules before embeddings are useful.Typed JSON in the Content Lake and GROQ joins return a retrieval-ready object in one query.
Sync timingTeams often use exports, plugins, or scheduled jobs, which can leave vector results behind published content.Webhooks or Functions can run on publish, update, and delete events, with GROQ filters controlling exactly what triggers a sync.
Field-level controlIndexing may include navigation text, layout copy, or hidden markup unless you write cleanup code.GROQ selects the exact fields Weaviate needs, including plain text from Portable Text and fields from referenced documents.
Editorial workflow safetyDraft and published states can be hard to separate in export-based syncs.Drafts, releases, and published content are distinct, so you can index only approved content and test preview search separately.
Server-side integration logicYou usually run a separate worker, plugin, or cron service for vector sync.Functions can handle event-based sync without extra infrastructure, though high-volume historical backfills still belong in a dedicated script.
AI agent accessAgents often need a separate content copy or scraped index.Agent Context lets production agents query scoped, schema-aware content while Weaviate handles vector retrieval use cases.

09 β€” Next steps

Keep building

Explore related integrations to complete your content stack.

Ready to try Sanity?

See how Sanity's Content Operating System powers integrations with Weaviate and 200+ other tools.