How to Integrate Supabase pgvector with Your Headless CMS
Sync structured content into Supabase pgvector so your search, RAG, and recommendation features use the latest published content within seconds.
What is Supabase pgvector?
Supabase pgvector pairs Supabase Postgres with the pgvector extension, letting you store embeddings in a vector column and search them with SQL. Teams use it when they want vector retrieval close to their app data, auth, database functions, and row-level security instead of running a separate vector database. Itโs a common choice for RAG apps, semantic search, product discovery, support bots, and recommendation systems built on Postgres.
Why integrate Supabase pgvector with a headless CMS?
Vector search only works well when the indexed content is current, clean, and shaped for retrieval. If editors update a product description, publish a support article, or change a policy page, your Supabase pgvector table needs the new text and embedding quickly. Otherwise, your AI search returns stale answers, your chatbot quotes old copy, and your recommendation system ranks content that no longer matches the source.
Architecture overview
The data flow starts when content is created, updated, published, or deleted in Sanityโs Content Lake. A GROQ-powered webhook filters the events that matter, for example published articles, product pages, or help center entries. The webhook calls a Sanity Function or your own API route with the document ID and mutation type. The function uses @sanity/client to fetch the full document with GROQ, including referenced fields that should be embedded, such as category names, product specs, or localized slugs. It then builds a plain text string, sends that text to an embedding model, and writes the result to Supabase with @supabase/supabase-js. In Supabase, the row usually includes the Sanity document ID, content type, title, URL slug, metadata JSONB, the source text, and an embedding vector column such as vector(1536). For search, your frontend sends a query to your app API, the API creates a query embedding, and Supabase runs a SQL function that orders rows by vector distance, often using an HNSW or IVFFlat index. The end user sees ranked results, answers with citations, or recommendations that point back to the original Sanity content.
Common use cases
Semantic site search
Index Sanity articles, product pages, and docs in Supabase pgvector so users can search by meaning instead of exact keywords.
RAG for support agents
Feed approved help center content into pgvector, retrieve the closest passages, and ground chatbot answers in published material.
Product discovery
Turn product descriptions, specs, tags, and editorial copy into embeddings so shoppers can search with phrases like โlightweight jacket for rainy commutes.โ
Localized content retrieval
Store locale metadata alongside embeddings so your app retrieves the right English, Spanish, or German content for each user.
Step-by-step integration
- 1
Set up Supabase pgvector
Create a Supabase project, copy the project URL and service role key for server-side writes, then enable the vector extension in the SQL editor with create extension if not exists vector;. Create a table such as content_embeddings with a Sanity document ID, content type, slug, metadata JSONB, source text, and an embedding vector column that matches your embedding model size, for example vector(1536) for text-embedding-3-small.
- 2
Add a vector search function in Supabase
Create a Postgres function such as match_content(query_embedding vector(1536), match_count int) that orders rows by embedding <=> query_embedding. Add an HNSW or IVFFlat index once you have enough rows to measure query speed with realistic data.
- 3
Model retrieval-ready content in Sanity Studio
Define schemas with fields your retrieval layer needs, such as title, summary, body, slug, locale, product attributes, category references, and published status. Keep reusable data in typed fields instead of hiding it in long rich text blocks.
- 4
Create the sync trigger
Use a Sanity webhook filtered with GROQ so it fires only for document types you want to index, or use a Sanity Function to run sync code on content mutations. Include the document ID and operation type in the payload.
- 5
Fetch, embed, and upsert
In your sync handler, use @sanity/client to fetch the latest published document with GROQ, convert the selected fields into clean plain text, generate an embedding, and upsert the row into Supabase with @supabase/supabase-js. On delete or unpublish events, remove the matching row from content_embeddings.
- 6
Test retrieval in the frontend
Build a search API route that embeds the user query, calls your Supabase match function with supabase.rpc(), and returns titles, slugs, excerpts, and similarity scores. Test with real queries, misspellings, synonyms, and content that was published less than 60 seconds ago.
Code example
Minimal webhook handler that receives a Sanity event, fetches the published document with GROQ, creates an embedding, and upserts it into Supabase pgvector.
import {createClient as createSanityClient} from '@sanity/client'
import {createClient as createSupabaseClient} from '@supabase/supabase-js'
import OpenAI from 'openai'
const sanity = createSanityClient({
projectId: process.env.SANITY_PROJECT_ID!,
dataset: process.env.SANITY_DATASET!,
apiVersion: '2025-01-01',
token: process.env.SANITY_READ_TOKEN!,
useCdn: false
})
const supabase = createSupabaseClient(
process.env.SUPABASE_URL!,
process.env.SUPABASE_SERVICE_ROLE_KEY!
)
const openai = new OpenAI({apiKey: process.env.OPENAI_API_KEY!})
export async function POST(req: Request) {
const {documentId, operation} = await req.json()
if (operation === 'delete') {
await supabase.from('content_embeddings').delete().eq('sanity_id', documentId)
return Response.json({ok: true})
}
const doc = await sanity.fetch(`
*[_id == $id][0]{
_id,
_type,
title,
summary,
'slug': slug.current,
'category': category->title,
body[]{..., children[]{text}}
}
`, {id: documentId})
if (!doc) return Response.json({ok: true, skipped: true})
const bodyText = (doc.body || [])
.flatMap((block: any) => block.children?.map((child: any) => child.text) || [])
.join(' ')
const sourceText = [doc.title, doc.summary, doc.category, bodyText]
.filter(Boolean)
.join('
')
const embedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: sourceText
})
await supabase.from('content_embeddings').upsert({
sanity_id: doc._id,
content_type: doc._type,
title: doc.title,
slug: doc.slug,
metadata: {category: doc.category},
source_text: sourceText,
embedding: embedding.data[0].embedding
}, {onConflict: 'sanity_id'})
return Response.json({ok: true})
}How Sanity + Supabase pgvector works
Build your Supabase pgvector integration on Sanity
Sanity gives you the structured content foundation, real-time event system, GROQ queries, and flexible APIs you need to keep Supabase pgvector aligned with published content.
Start building free โCMS approaches to Supabase pgvector
| Capability | Traditional CMS | Sanity |
|---|---|---|
| Embedding input quality | Content is often stored as pages or HTML, so teams spend time stripping markup and guessing which text should be embedded. | Typed JSON in the Content Lake lets you build embedding input from specific fields, rich text, and references with one GROQ query. |
| Sync on publish | Index updates often depend on scheduled exports, plugins, or manual rebuilds. | Webhooks or Functions can trigger on content mutations and run the Supabase upsert or delete logic without a separate worker. |
| Field-level query control | Search pipelines often receive entire pages, including navigation, footers, and layout copy. | GROQ projections select exactly the fields Supabase pgvector needs, including joined references and localized values. |
| Deletion and unpublish handling | Removed pages can stay in vector results until the next full reindex. | Mutation events can carry document IDs, so your sync code can delete matching rows from content_embeddings right away. |
| Multi-channel reuse | Content is often shaped for one website, so AI retrieval needs a separate copy or export. | One structured back end can feed websites, mobile apps, Supabase pgvector, Content Agent workflows, and Agent Context. |
Keep building
Explore related integrations to complete your content stack.
Sanity + Pinecone
Use Pinecone for high-scale vector search when you want a managed vector database separate from your application Postgres.
Sanity + Weaviate
Connect Sanity content to Weaviate for semantic search and hybrid retrieval with metadata filters.
Sanity + Qdrant
Sync structured content into Qdrant for fast vector search with payload filtering and self-hosting options.