How to Integrate Airbyte with Your Headless CMS
Keep Airbyte pipelines current when content changes by syncing structured headless CMS data into warehouses, apps, and automation workflows within minutes.
What is Airbyte?
Airbyte is an open-source data movement platform for ELT pipelines. Teams use it to move data between APIs, databases, warehouses, lakehouses, and operational tools through 550+ connectors. It’s common in data engineering, analytics, and operations teams that need repeatable syncs without writing a new connector for every source and destination.
Why integrate Airbyte with a headless CMS?
If content changes in your editing workflow but your warehouse, CRM, support tool, or internal ops app updates once a day, teams end up working from two versions of the truth. Product launches show old descriptions in dashboards. Support teams search stale help articles. Analysts can’t join published content with traffic, conversion, or revenue data until the next batch job finishes.
Connecting Airbyte to a headless CMS solves that data movement problem. Airbyte handles the pipeline from your content source to destinations like Snowflake, BigQuery, Databricks, Postgres, or Elasticsearch. A structured content back end, like Sanity’s Content Lake, gives Airbyte typed JSON instead of HTML pages or rich text blobs, so each sync can move fields like sku, title, locale, category, publish date, and author as clean columns or documents.
The alternative is usually CSV exports, nightly cron jobs, and glue code that breaks when an editor adds a field. With Sanity, a publish event can trigger a webhook, a Function can fetch the exact fields with GROQ, and Airbyte can run the right connection. There’s still setup work, especially if you build a custom Airbyte source, but the integration has clear boundaries: Sanity owns structured content and publish events, while Airbyte owns transport, retries, destinations, and sync history.
Architecture overview
A typical setup starts in Sanity’s Content Lake, where published content lives as structured JSON. A GROQ-powered webhook fires only for the document types you care about, for example product, article, or helpArticle, and only when a draft becomes a published document. The webhook sends the document ID to a Sanity Function or a small webhook endpoint. The Function fetches the full record from Sanity with GROQ, including referenced fields such as category title, author name, or localized slugs. That keeps the payload small and avoids asking Airbyte to pull extra fields it doesn’t need. The Function then calls Airbyte’s public API, usually POST /v1/jobs with jobType set to sync and the Airbyte connectionId for the pipeline you want to run. For Airbyte Open Source, the same pattern uses your instance’s public API base URL, often /api/public/v1. Airbyte is usually pull-based, so the job tells Airbyte to run a configured connection. That connection can read from a custom Sanity source, a source built with Airbyte Connector Builder against Sanity’s HTTP API, or a staging endpoint your Function updates. Airbyte then writes to the destination, such as BigQuery for analytics, Postgres for internal apps, or Elasticsearch for search. The end user sees the result in the downstream experience: a dashboard with fresh content metadata, a search index with the latest article title, or an internal tool that reflects the current product catalog.
Common use cases
Content analytics in the warehouse
Sync published articles, authors, topics, and publish dates into BigQuery or Snowflake so analysts can join content metadata with traffic, signups, and revenue.
Product catalog data movement
Move product copy, category references, launch dates, and locale fields from Sanity into Postgres, Databricks, or operational tools used by commerce teams.
Search index refreshes
Trigger Airbyte syncs when docs are published so downstream search destinations can receive the latest titles, summaries, tags, and canonical URLs.
Experiment and personalization feeds
Send campaign variants, audience labels, and content status into analytics or personalization systems without asking editors to copy fields into another tool.
Step-by-step integration
- 1
Set up Airbyte and get API access
Create an Airbyte Cloud workspace or use an Airbyte Open Source instance. In Airbyte Cloud, create an API key from Settings, then Applications. Install the tools you’ll use in your webhook service, for example npm install @sanity/client airbyte-api, or call the Airbyte public API directly with fetch.
- 2
Create the Airbyte source and destination
Choose the destination first, such as BigQuery, Snowflake, Postgres, Databricks, or Elasticsearch. For the source, use Airbyte Connector Builder against Sanity’s HTTP API, a custom Airbyte source, or a staging endpoint that exposes the JSON your pipeline should read. Save the Airbyte connection ID after you create the connection.
- 3
Model content in Sanity Studio
Define fields that map cleanly to your Airbyte destination. For a product pipeline, that might include sku, title, slug, price, category reference, locale, status, and publishedAt. Schema-as-code keeps those fields versioned with the rest of your app code.
- 4
Create a filtered webhook or Sanity Function
Add a webhook that fires on publish for the document types Airbyte should sync. A typical filter is _type in ["product", "article"] && !(_id in path("drafts.**")). Send only the document ID in the webhook body, then fetch the full record server-side with GROQ.
- 5
Call Airbyte when content changes
In your Function or webhook handler, use @sanity/client to fetch the changed document, then call Airbyte POST /v1/jobs with the connectionId and jobType: "sync". This queues a run for the configured connection instead of waiting for the next scheduled sync.
- 6
Test the full path
Publish one test document, confirm the webhook fired, check the Airbyte job status in the Airbyte UI or API, and inspect the destination table or index. Then load the frontend, dashboard, or internal tool that reads from that destination and verify the new fields appear as expected.
Code example
import { createClient } from "@sanity/client";
const sanity = createClient({
projectId: process.env.SANITY_PROJECT_ID!,
dataset: process.env.SANITY_DATASET!,
apiVersion: "2025-01-01",
token: process.env.SANITY_READ_TOKEN,
useCdn: false
});
export async function POST(req: Request) {
const { _id } = await req.json();
const doc = await sanity.fetch(
`*[_id == $id][0]{_id,_type,title,"slug":slug.current,"category":category->title,_updatedAt}`,
{ id: _id }
);
if (!doc) return Response.json({ skipped: true }, { status: 404 });
const airbyteRes = await fetch(`${process.env.AIRBYTE_API_URL ?? "https://api.airbyte.com/v1"}/jobs`, {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.AIRBYTE_API_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
connectionId: process.env.AIRBYTE_CONNECTION_ID,
jobType: "sync"
})
});
if (!airbyteRes.ok) {
return Response.json({ error: await airbyteRes.text() }, { status: 502 });
}
const job = await airbyteRes.json();
return Response.json({ queued: true, documentId: doc._id, airbyteJob: job.jobId ?? job.id });
}How Sanity + Airbyte works
Build your Airbyte integration on Sanity
Sanity gives you the structured content foundation, real-time event system, and flexible APIs to connect published content with Airbyte pipelines.
Start building free →CMS approaches to Airbyte
| Capability | Traditional CMS | Sanity |
|---|---|---|
| Content shape for Airbyte | Often exports rendered pages or plugin-specific data, so teams may need parsing before data lands in a warehouse. | Content Lake stores typed JSON, and GROQ can return a destination-ready shape with referenced fields included. |
| Sync timing after publish | Commonly depends on scheduled exports, database dumps, or plugin jobs. | GROQ-powered webhooks can fire only for matching publish events, then call Airbyte right away. |
| Server-side sync logic | Often needs a hosted plugin, cron server, or custom middleware that your team maintains. | Functions can run sync logic on content mutations without separate infrastructure, with 500K invocations per month included. |
| Field-level control | Exports may include full page bodies, admin fields, or extra plugin metadata. | GROQ projections let you send only the fields Airbyte should move, including joined reference data. |
| Multi-destination delivery | Content often gets copied into separate tools for web, analytics, search, and operations. | One structured back end feeds websites, apps, Airbyte destinations, and AI agents through scoped APIs. |
| Trade-offs | Fast to start if a plugin exists, but harder to control data shape and change timing. | Requires thoughtful schema design and Airbyte source setup, but gives developers exact control over content shape and sync triggers. |
Keep building
Explore related integrations to complete your content stack.
Sanity + Zapier
Send Sanity publish events into no-code workflows for Slack alerts, task creation, spreadsheet updates, and lightweight ops handoffs.
Sanity + n8n
Build self-hosted automation flows that react to Sanity webhooks, call external APIs, and update downstream tools.
Sanity + Pipedream
Run small code steps between Sanity webhooks and third-party APIs when you need custom logic without a full service.