AI Vector & Retrieval8 min read

How to Integrate Supabase pgvector with Your Headless CMS

Sync structured content into Supabase pgvector so your search, RAG, and recommendation features use the latest published content within seconds.

Published April 29, 2026
01 โ€” Overview

What is Supabase pgvector?

Supabase pgvector pairs Supabase Postgres with the pgvector extension, letting you store embeddings in a vector column and search them with SQL. Teams use it when they want vector retrieval close to their app data, auth, database functions, and row-level security instead of running a separate vector database. Itโ€™s a common choice for RAG apps, semantic search, product discovery, support bots, and recommendation systems built on Postgres.


02 โ€” The case for integration

Why integrate Supabase pgvector with a headless CMS?

Vector search only works well when the indexed content is current, clean, and shaped for retrieval. If editors update a product description, publish a support article, or change a policy page, your Supabase pgvector table needs the new text and embedding quickly. Otherwise, your AI search returns stale answers, your chatbot quotes old copy, and your recommendation system ranks content that no longer matches the source.


03 โ€” Architecture

Architecture overview

The data flow starts when content is created, updated, published, or deleted in Sanityโ€™s Content Lake. A GROQ-powered webhook filters the events that matter, for example published articles, product pages, or help center entries. The webhook calls a Sanity Function or your own API route with the document ID and mutation type. The function uses @sanity/client to fetch the full document with GROQ, including referenced fields that should be embedded, such as category names, product specs, or localized slugs. It then builds a plain text string, sends that text to an embedding model, and writes the result to Supabase with @supabase/supabase-js. In Supabase, the row usually includes the Sanity document ID, content type, title, URL slug, metadata JSONB, the source text, and an embedding vector column such as vector(1536). For search, your frontend sends a query to your app API, the API creates a query embedding, and Supabase runs a SQL function that orders rows by vector distance, often using an HNSW or IVFFlat index. The end user sees ranked results, answers with citations, or recommendations that point back to the original Sanity content.


04 โ€” Use cases

Common use cases

๐Ÿ”Ž

Semantic site search

Index Sanity articles, product pages, and docs in Supabase pgvector so users can search by meaning instead of exact keywords.

๐Ÿค–

RAG for support agents

Feed approved help center content into pgvector, retrieve the closest passages, and ground chatbot answers in published material.

๐Ÿ›๏ธ

Product discovery

Turn product descriptions, specs, tags, and editorial copy into embeddings so shoppers can search with phrases like โ€œlightweight jacket for rainy commutes.โ€

๐ŸŒ

Localized content retrieval

Store locale metadata alongside embeddings so your app retrieves the right English, Spanish, or German content for each user.


05 โ€” Implementation

Step-by-step integration

  1. 1

    Set up Supabase pgvector

    Create a Supabase project, copy the project URL and service role key for server-side writes, then enable the vector extension in the SQL editor with create extension if not exists vector;. Create a table such as content_embeddings with a Sanity document ID, content type, slug, metadata JSONB, source text, and an embedding vector column that matches your embedding model size, for example vector(1536) for text-embedding-3-small.

  2. 2

    Add a vector search function in Supabase

    Create a Postgres function such as match_content(query_embedding vector(1536), match_count int) that orders rows by embedding <=> query_embedding. Add an HNSW or IVFFlat index once you have enough rows to measure query speed with realistic data.

  3. 3

    Model retrieval-ready content in Sanity Studio

    Define schemas with fields your retrieval layer needs, such as title, summary, body, slug, locale, product attributes, category references, and published status. Keep reusable data in typed fields instead of hiding it in long rich text blocks.

  4. 4

    Create the sync trigger

    Use a Sanity webhook filtered with GROQ so it fires only for document types you want to index, or use a Sanity Function to run sync code on content mutations. Include the document ID and operation type in the payload.

  5. 5

    Fetch, embed, and upsert

    In your sync handler, use @sanity/client to fetch the latest published document with GROQ, convert the selected fields into clean plain text, generate an embedding, and upsert the row into Supabase with @supabase/supabase-js. On delete or unpublish events, remove the matching row from content_embeddings.

  6. 6

    Test retrieval in the frontend

    Build a search API route that embeds the user query, calls your Supabase match function with supabase.rpc(), and returns titles, slugs, excerpts, and similarity scores. Test with real queries, misspellings, synonyms, and content that was published less than 60 seconds ago.


06 โ€” Code

Code example

Minimal webhook handler that receives a Sanity event, fetches the published document with GROQ, creates an embedding, and upserts it into Supabase pgvector.

typescript
import {createClient as createSanityClient} from '@sanity/client'
import {createClient as createSupabaseClient} from '@supabase/supabase-js'
import OpenAI from 'openai'

const sanity = createSanityClient({
  projectId: process.env.SANITY_PROJECT_ID!,
  dataset: process.env.SANITY_DATASET!,
  apiVersion: '2025-01-01',
  token: process.env.SANITY_READ_TOKEN!,
  useCdn: false
})

const supabase = createSupabaseClient(
  process.env.SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY!
)

const openai = new OpenAI({apiKey: process.env.OPENAI_API_KEY!})

export async function POST(req: Request) {
  const {documentId, operation} = await req.json()

  if (operation === 'delete') {
    await supabase.from('content_embeddings').delete().eq('sanity_id', documentId)
    return Response.json({ok: true})
  }

  const doc = await sanity.fetch(`
    *[_id == $id][0]{
      _id,
      _type,
      title,
      summary,
      'slug': slug.current,
      'category': category->title,
      body[]{..., children[]{text}}
    }
  `, {id: documentId})

  if (!doc) return Response.json({ok: true, skipped: true})

  const bodyText = (doc.body || [])
    .flatMap((block: any) => block.children?.map((child: any) => child.text) || [])
    .join(' ')

  const sourceText = [doc.title, doc.summary, doc.category, bodyText]
    .filter(Boolean)
    .join('
')

  const embedding = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: sourceText
  })

  await supabase.from('content_embeddings').upsert({
    sanity_id: doc._id,
    content_type: doc._type,
    title: doc.title,
    slug: doc.slug,
    metadata: {category: doc.category},
    source_text: sourceText,
    embedding: embedding.data[0].embedding
  }, {onConflict: 'sanity_id'})

  return Response.json({ok: true})
}

07 โ€” Why Sanity

How Sanity + Supabase pgvector works

Build your Supabase pgvector integration on Sanity

Sanity gives you the structured content foundation, real-time event system, GROQ queries, and flexible APIs you need to keep Supabase pgvector aligned with published content.

Start building free โ†’

08 โ€” Comparison

CMS approaches to Supabase pgvector

CapabilityTraditional CMSSanity
Embedding input qualityContent is often stored as pages or HTML, so teams spend time stripping markup and guessing which text should be embedded.Typed JSON in the Content Lake lets you build embedding input from specific fields, rich text, and references with one GROQ query.
Sync on publishIndex updates often depend on scheduled exports, plugins, or manual rebuilds.Webhooks or Functions can trigger on content mutations and run the Supabase upsert or delete logic without a separate worker.
Field-level query controlSearch pipelines often receive entire pages, including navigation, footers, and layout copy.GROQ projections select exactly the fields Supabase pgvector needs, including joined references and localized values.
Deletion and unpublish handlingRemoved pages can stay in vector results until the next full reindex.Mutation events can carry document IDs, so your sync code can delete matching rows from content_embeddings right away.
Multi-channel reuseContent is often shaped for one website, so AI retrieval needs a separate copy or export.One structured back end can feed websites, mobile apps, Supabase pgvector, Content Agent workflows, and Agent Context.

09 โ€” Next steps

Keep building

Explore related integrations to complete your content stack.

Ready to try Sanity?

See how Sanity's Content Operating System powers integrations with Supabase pgvector and 200+ other tools.