AI Content & Workflows8 min read

How to Integrate Dify with Your Headless CMS

Connect Dify to your headless CMS so chatbots, agents, and RAG workflows answer from published, structured content instead of stale exports.

Published April 29, 2026
01 — Overview

What is Dify?

Dify is an open-source platform for building LLM applications, including chatbots, agents, retrieval workflows, and internal AI tools. Teams use it to connect model providers, create visual workflows, add Knowledge datasets for RAG, and expose the result through APIs or embedded apps. It’s used by product, support, operations, and engineering teams that need AI workflows they can inspect, test, and ship without building every orchestration layer from scratch.


02 — The case for integration

Why integrate Dify with a headless CMS?

Dify gets much more useful when its Knowledge datasets are fed by the same content your customers and teams already trust. If your support articles, product docs, policies, course material, and marketing pages live in one structured back end, Dify can answer questions from approved content instead of copied Google Docs, CSV uploads, or one-off prompt files.

The hard part is keeping that data current. Without an integration, someone has to remember to export content, clean it, upload it to Dify, wait for indexing, and hope nobody changed the source five minutes later. That breaks quickly when you have 500 product pages, 12 locales, and support content changing every day.

A headless CMS category tool can work for this, but the quality of the integration depends on how structured the content is and how fast it can react to change. With Sanity’s AI Content Operating System, content in the Content Lake is typed JSON, GROQ selects only the fields Dify needs, and webhooks or Functions can sync the moment content is published. That means Dify receives clean article text, product facts, slugs, categories, locale data, and references without scraping HTML or parsing page blobs.


03 — Architecture

Architecture overview

A typical Sanity and Dify integration starts when an editor publishes or updates content in Sanity Studio. A Sanity webhook, filtered with GROQ to fire only for relevant document types such as article, product, policy, or faq, sends the changed document ID to a server endpoint or Sanity Function. That endpoint uses @sanity/client to fetch the latest document from the Content Lake with GROQ. The query can flatten Portable Text to plain text, resolve references such as category or product family, and include only fields Dify should index. For example, you might send title, slug, locale, summary, bodyText, updatedAt, and related product names, while leaving internal notes out. The server-side handler then calls Dify’s Knowledge API, usually the create-by-text endpoint for new documents or the update-by-text endpoint when you’ve stored the Dify document ID. Dify indexes that text into a Knowledge dataset. Your Dify chatbot, agent, or workflow can then retrieve the relevant chunks during a conversation and return answers to the end user through Dify’s app API, embedded chat UI, or your own frontend.


04 — Use cases

Common use cases

🤖

Support chatbot grounded in approved articles

Sync published help center articles from Sanity into a Dify Knowledge dataset so the bot answers from current support content.

đź§­

Product advisor for complex catalogs

Send structured product specs, compatibility notes, and buying guides to Dify so an agent can recommend the right product by use case.

🌎

Localized AI answers

Use locale fields from Sanity to index separate English, German, or Japanese content into Dify and route users to the right language.

📝

Internal policy assistant

Publish HR, legal, and security policies in Sanity, sync them to Dify, and let employees ask questions without reading 40-page PDFs.


05 — Implementation

Step-by-step integration

  1. 1

    Set up Dify

    Create a Dify Cloud account or self-host Dify, then create a Knowledge dataset for the content you want to retrieve. Choose your embedding model, keep the dataset ID, and create an API key from Dify’s API access settings. For this sync flow, Dify’s REST API is enough. You’ll use native fetch or your server framework’s HTTP client rather than a required Dify SDK.

  2. 2

    Model the source content in Sanity Studio

    Create schemas for the documents Dify should index, such as article, faq, product, policy, or guide. Include fields like title, slug, summary, body, locale, category, audience, and lastReviewedAt. If your Dify app needs filtering, model those filters as fields instead of burying them in body copy.

  3. 3

    Create a GROQ query for Dify’s input

    Write a GROQ projection that returns exactly what Dify needs. For example, fetch the document title, current slug, locale, category title, and plain text from Portable Text with pt::text(body). Leave out draft notes, workflow fields, and unpublished references.

  4. 4

    Add the sync mechanism

    Create a Sanity webhook filtered to published document changes, or use a Sanity Function if you want the sync logic to run inside Sanity’s server-side event system. The webhook payload can be as small as {"_id": _id}, because the handler will fetch the current document before sending it to Dify.

  5. 5

    Call Dify’s Knowledge API

    From your webhook handler or Function, call Dify’s /v1/datasets/{dataset_id}/document/create-by-text endpoint for new content. For production, store the returned Dify document ID in a mapping table or back on the Sanity document so later edits can call update-by-text instead of creating duplicates.

  6. 6

    Test retrieval in the user experience

    Publish one test article, confirm it appears in the Dify dataset, then ask the Dify app a question that should retrieve it. Test edits, deletes, locale routing, and long Portable Text documents before you connect the chatbot or agent to production traffic.


06 — Code

Code example

typescript
import {createClient} from '@sanity/client'

const sanity = createClient({
  projectId: process.env.SANITY_PROJECT_ID!,
  dataset: process.env.SANITY_DATASET!,
  apiVersion: '2025-02-19',
  token: process.env.SANITY_READ_TOKEN,
  useCdn: false
})

export async function POST(req: Request) {
  const {_id} = await req.json()

  const doc = await sanity.fetch(`*[_id == $id][0]{
    title,
    "slug": slug.current,
    locale,
    "category": category->title,
    "bodyText": pt::text(body)
  }`, {id: _id})

  if (!doc) return Response.json({ok: false}, {status: 404})

  const text = [
    `Title: ${doc.title}`,
    `Slug: ${doc.slug}`,
    `Locale: ${doc.locale || 'en'}`,
    `Category: ${doc.category || ''}`,
    doc.bodyText
  ].join('

')

  const res = await fetch(
    `${process.env.DIFY_API_URL || 'https://api.dify.ai/v1'}/datasets/${process.env.DIFY_DATASET_ID}/document/create-by-text`,
    {
      method: 'POST',
      headers: {
        Authorization: `Bearer ${process.env.DIFY_API_KEY}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        name: doc.title,
        text,
        indexing_technique: 'high_quality',
        process_rule: {mode: 'automatic'}
      })
    }
  )

  if (!res.ok) throw new Error(await res.text())
  return Response.json({ok: true, dify: await res.json()})
}

07 — Why Sanity

How Sanity + Dify works

Build your Dify integration on Sanity

Sanity’s AI Content Operating System gives you the structured content foundation, real-time event system, and flexible APIs to connect Dify without scraping pages or running scheduled exports.

Start building free →

08 — Comparison

CMS approaches to Dify

CapabilityTraditional CMSSanity
Structured data for Dify indexingContent often mixes copy, layout, plugin output, and shortcodes, so Dify ingestion usually needs cleanup.The Content Lake keeps typed JSON, and GROQ can return title, bodyText, locale, categories, and references in one payload.
Real-time sync after publishTeams often rely on scheduled exports, plugins, or manual uploads to refresh Dify datasets.GROQ-filtered webhooks or Functions can trigger only on the document types and publish events Dify should index.
Field-level control for AI contextDify may receive full rendered pages, including navigation, cookie text, and unrelated modules.GROQ projections select the exact fields, resolve references, and exclude internal notes before calling Dify.
Handling edits and deletesContent changes can leave stale chunks in Dify unless someone runs a cleanup job.Mutation events include create, update, and delete signals. You’ll still need to persist Dify document IDs for clean updates.
Multi-channel content useAI ingestion is often separate from website publishing, which creates duplicate content paths.One structured back end can feed web, mobile, Dify, Agent Context, and other AI workflows with channel-specific GROQ queries.

09 — Next steps

Keep building

Explore related integrations to complete your content stack.

Ready to try Sanity?

See how Sanity's Content Operating System powers integrations with Dify and 200+ other tools.