Automation & Integration8 min read

How to Integrate Airbyte with Your Headless CMS

Keep Airbyte pipelines current when content changes by syncing structured headless CMS data into warehouses, apps, and automation workflows within minutes.

Published April 29, 2026

01 — Overview

What is Airbyte?

Airbyte is an open-source data movement platform for ELT pipelines. Teams use it to move data between APIs, databases, warehouses, lakehouses, and operational tools through 550+ connectors. It’s common in data engineering, analytics, and operations teams that need repeatable syncs without writing a new connector for every source and destination.

02 — The case for integration

Why integrate Airbyte with a headless CMS?

If content changes in your editing workflow but your warehouse, CRM, support tool, or internal ops app updates once a day, teams end up working from two versions of the truth. Product launches show old descriptions in dashboards. Support teams search stale help articles. Analysts can’t join published content with traffic, conversion, or revenue data until the next batch job finishes.

Connecting Airbyte to a headless CMS solves that data movement problem. Airbyte handles the pipeline from your content source to destinations like Snowflake, BigQuery, Databricks, Postgres, or Elasticsearch. A structured content back end, like Sanity’s Content Lake, gives Airbyte typed JSON instead of HTML pages or rich text blobs, so each sync can move fields like sku, title, locale, category, publish date, and author as clean columns or documents.

The alternative is usually CSV exports, nightly cron jobs, and glue code that breaks when an editor adds a field. With Sanity, a publish event can trigger a webhook, a Function can fetch the exact fields with GROQ, and Airbyte can run the right connection. There’s still setup work, especially if you build a custom Airbyte source, but the integration has clear boundaries: Sanity owns structured content and publish events, while Airbyte owns transport, retries, destinations, and sync history.

03 — Architecture

Architecture overview

A typical setup starts in Sanity’s Content Lake, where published content lives as structured JSON. A GROQ-powered webhook fires only for the document types you care about, for example product, article, or helpArticle, and only when a draft becomes a published document. The webhook sends the document ID to a Sanity Function or a small webhook endpoint. The Function fetches the full record from Sanity with GROQ, including referenced fields such as category title, author name, or localized slugs. That keeps the payload small and avoids asking Airbyte to pull extra fields it doesn’t need. The Function then calls Airbyte’s public API, usually POST /v1/jobs with jobType set to sync and the Airbyte connectionId for the pipeline you want to run. For Airbyte Open Source, the same pattern uses your instance’s public API base URL, often /api/public/v1. Airbyte is usually pull-based, so the job tells Airbyte to run a configured connection. That connection can read from a custom Sanity source, a source built with Airbyte Connector Builder against Sanity’s HTTP API, or a staging endpoint your Function updates. Airbyte then writes to the destination, such as BigQuery for analytics, Postgres for internal apps, or Elasticsearch for search. The end user sees the result in the downstream experience: a dashboard with fresh content metadata, a search index with the latest article title, or an internal tool that reflects the current product catalog.

04 — Use cases

Common use cases

📊

Content analytics in the warehouse

Sync published articles, authors, topics, and publish dates into BigQuery or Snowflake so analysts can join content metadata with traffic, signups, and revenue.

🛒

Product catalog data movement

Move product copy, category references, launch dates, and locale fields from Sanity into Postgres, Databricks, or operational tools used by commerce teams.

🔎

Search index refreshes

Trigger Airbyte syncs when docs are published so downstream search destinations can receive the latest titles, summaries, tags, and canonical URLs.

🧪

Experiment and personalization feeds

Send campaign variants, audience labels, and content status into analytics or personalization systems without asking editors to copy fields into another tool.

05 — Implementation

Step-by-step integration

1
Set up Airbyte and get API access
Create an Airbyte Cloud workspace or use an Airbyte Open Source instance. In Airbyte Cloud, create an API key from Settings, then Applications. Install the tools you’ll use in your webhook service, for example npm install @sanity/client airbyte-api, or call the Airbyte public API directly with fetch.
2
Create the Airbyte source and destination
Choose the destination first, such as BigQuery, Snowflake, Postgres, Databricks, or Elasticsearch. For the source, use Airbyte Connector Builder against Sanity’s HTTP API, a custom Airbyte source, or a staging endpoint that exposes the JSON your pipeline should read. Save the Airbyte connection ID after you create the connection.
3
Model content in Sanity Studio
Define fields that map cleanly to your Airbyte destination. For a product pipeline, that might include sku, title, slug, price, category reference, locale, status, and publishedAt. Schema-as-code keeps those fields versioned with the rest of your app code.
4
Create a filtered webhook or Sanity Function
Add a webhook that fires on publish for the document types Airbyte should sync. A typical filter is _type in ["product", "article"] && !(_id in path("drafts.**")). Send only the document ID in the webhook body, then fetch the full record server-side with GROQ.
5
Call Airbyte when content changes
In your Function or webhook handler, use @sanity/client to fetch the changed document, then call Airbyte POST /v1/jobs with the connectionId and jobType: "sync". This queues a run for the configured connection instead of waiting for the next scheduled sync.
6
Test the full path
Publish one test document, confirm the webhook fired, check the Airbyte job status in the Airbyte UI or API, and inspect the destination table or index. Then load the frontend, dashboard, or internal tool that reads from that destination and verify the new fields appear as expected.

06 — Code

Code example

typescriptairbyte-sync-webhook.ts

import { createClient } from "@sanity/client";

const sanity = createClient({
  projectId: process.env.SANITY_PROJECT_ID!,
  dataset: process.env.SANITY_DATASET!,
  apiVersion: "2025-01-01",
  token: process.env.SANITY_READ_TOKEN,
  useCdn: false
});

export async function POST(req: Request) {
  const { _id } = await req.json();

  const doc = await sanity.fetch(
    `*[_id == $id][0]{_id,_type,title,"slug":slug.current,"category":category->title,_updatedAt}`,
    { id: _id }
  );

  if (!doc) return Response.json({ skipped: true }, { status: 404 });

  const airbyteRes = await fetch(`${process.env.AIRBYTE_API_URL ?? "https://api.airbyte.com/v1"}/jobs`, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.AIRBYTE_API_KEY}`,
      "Content-Type": "application/json"
    },
    body: JSON.stringify({
      connectionId: process.env.AIRBYTE_CONNECTION_ID,
      jobType: "sync"
    })
  });

  if (!airbyteRes.ok) {
    return Response.json({ error: await airbyteRes.text() }, { status: 502 });
  }

  const job = await airbyteRes.json();
  return Response.json({ queued: true, documentId: doc._id, airbyteJob: job.jobId ?? job.id });
}

07 — Why Sanity

How Sanity + Airbyte works

Build your Airbyte integration on Sanity

Sanity gives you the structured content foundation, real-time event system, and flexible APIs to connect published content with Airbyte pipelines.

Start building free →

08 — Comparison

CMS approaches to Airbyte

Capability	Traditional CMS	Sanity
Content shape for Airbyte	Often exports rendered pages or plugin-specific data, so teams may need parsing before data lands in a warehouse.	Content Lake stores typed JSON, and GROQ can return a destination-ready shape with referenced fields included.
Sync timing after publish	Commonly depends on scheduled exports, database dumps, or plugin jobs.	GROQ-powered webhooks can fire only for matching publish events, then call Airbyte right away.
Server-side sync logic	Often needs a hosted plugin, cron server, or custom middleware that your team maintains.	Functions can run sync logic on content mutations without separate infrastructure, with 500K invocations per month included.
Field-level control	Exports may include full page bodies, admin fields, or extra plugin metadata.	GROQ projections let you send only the fields Airbyte should move, including joined reference data.
Multi-destination delivery	Content often gets copied into separate tools for web, analytics, search, and operations.	One structured back end feeds websites, apps, Airbyte destinations, and AI agents through scoped APIs.
Trade-offs	Fast to start if a plugin exists, but harder to control data shape and change timing.	Requires thoughtful schema design and Airbyte source setup, but gives developers exact control over content shape and sync triggers.

09 — Next steps

Keep building

Explore related integrations to complete your content stack.

⚡

How to Integrate Airbyte with Your Headless CMS

What is Airbyte?

Why integrate Airbyte with a headless CMS?

Architecture overview

Common use cases

Content analytics in the warehouse

Product catalog data movement

Search index refreshes

Experiment and personalization feeds

Step-by-step integration

Set up Airbyte and get API access

Create the Airbyte source and destination

Model content in Sanity Studio

Create a filtered webhook or Sanity Function

Call Airbyte when content changes

Test the full path

Code example

How Sanity + Airbyte works

Build your Airbyte integration on Sanity

CMS approaches to Airbyte

Keep building

Sanity + Zapier

Sanity + n8n

Sanity + Pipedream

Hevo Data

Fivetran

Digibee

Pipedream

Tray.io

Workato

n8n

Zapier