How to Integrate BigQuery with Your Headless CMS
Connect BigQuery to your headless CMS so published content, campaign metadata, and editorial dimensions are ready for near real-time analytics and BI.
What is BigQuery?
BigQuery is Google Cloud's serverless data warehouse for SQL analytics across large datasets, from gigabytes to petabytes. Data teams use it for event analysis, BI dashboards, machine learning features, and joining data from tools like Google Analytics 4, Ads, Segment, and product databases. It's one of the main warehouse choices for teams already running on Google Cloud.
Why integrate BigQuery with a headless CMS?
Content analytics gets messy when your warehouse only sees page URLs and event names. A BigQuery integration lets you join behavioral data with structured editorial fields like author, topic, locale, campaign, publish date, content type, and experiment variant. Instead of asking, "How did /blog/foo perform?", you can ask, "Which Spanish product guides written by the growth team generated demo requests within 7 days of publish?"
A headless CMS with structured content and real-time webhooks gives BigQuery clean change events instead of scraped HTML, CSV exports, or nightly crawlers. With Sanity, content lives as typed JSON in the Content Lake, GROQ selects the exact fields your warehouse needs, and webhooks or Functions send updates when a document is published, updated, or deleted.
The trade-off is schema work. You need to decide how content maps to BigQuery tables, how to handle references, and whether high-volume changes should stream directly or pass through Pub/Sub. But once that shape is defined, analysts get reliable dimensions without asking developers to backfill metadata every time a content model changes.
Architecture overview
A typical Sanity to BigQuery flow starts when an editor publishes or updates a document in Sanity Studio. A GROQ-powered webhook filters for the documents you care about, for example published articles, landing pages, or campaign pages, and sends the document ID to a serverless endpoint. You can also run the same logic in a Sanity Function so the sync code runs on content mutations without standing up separate infrastructure. The handler uses @sanity/client to fetch the current document from the Content Lake with a GROQ projection. That projection can flatten references before the row reaches BigQuery, such as author->name, categories[]->slug.current, or campaign->utmCode. The handler then calls BigQuery through the @google-cloud/bigquery SDK, which uses the BigQuery tabledata.insertAll API under the hood for streaming inserts. For smaller editorial volumes, direct table inserts are usually enough. For thousands of mutations per minute, put Pub/Sub between the webhook and BigQuery, or use the BigQuery Storage Write API. Once rows land in BigQuery, analysts can join them with GA4 exports, ad spend, CRM data, or product events and expose results in Looker Studio, Looker, Mode, Hex, dbt, or custom dashboards.
Common use cases
Content performance warehouse
Sync article metadata into BigQuery and join it with GA4 page_view, scroll, signup, and purchase events by slug or canonical URL.
Experiment and campaign analysis
Send content variant IDs, campaign codes, and publish windows to BigQuery so analysts can compare conversion by version.
Localization coverage reporting
Track locale, market, publish status, and translation relationships in BigQuery to find regional content gaps.
Recommendation and ML feature tables
Combine content taxonomy from Sanity with engagement events in BigQuery ML for recommendations, propensity models, and audience scoring.
Step-by-step integration
- 1
Set up BigQuery credentials
Create or choose a Google Cloud project, enable the BigQuery API, create a dataset such as content_analytics, and add a table such as sanity_content. Create a service account with BigQuery Data Editor access for the dataset and BigQuery Job User if you plan to run load jobs. For local work, set GOOGLE_APPLICATION_CREDENTIALS to the service account JSON file. In production, prefer Workload Identity or your host's secret manager over a checked-in key file.
- 2
Install the SDKs
Install the BigQuery and Sanity clients in your sync endpoint with npm install @google-cloud/bigquery @sanity/client. Keep SANITY_PROJECT_ID, SANITY_DATASET, SANITY_API_VERSION, SANITY_READ_TOKEN, GCP_PROJECT_ID, BQ_DATASET, and BQ_TABLE in environment variables.
- 3
Model analytics-ready fields in Sanity Studio
Add fields that analysts can query without guessing: title, slug, publishedAt, canonicalUrl, author reference, topics references, locale, campaignId, experimentVariant, and contentType. Use stable IDs and date fields. Avoid burying analytics labels inside rich text where BigQuery has to parse them later.
- 4
Create the sync trigger
Create a Sanity webhook filtered to published documents, for example !(_id in path("drafts.**")) && _type in ["post", "landingPage"]. Send the webhook to your endpoint with a shared secret. If you don't want to operate a separate endpoint, use a Sanity Function to run the same server-side code on content mutations.
- 5
Write rows to BigQuery
In the handler, fetch the latest document from the Content Lake with GROQ, project references into flat fields, and call BigQuery table.insert(). Use streaming inserts for low to moderate publish volume. For larger pipelines, send webhook events to Pub/Sub, then batch into BigQuery with Dataflow, Cloud Run, or the Storage Write API.
- 6
Test the data and build reporting
Publish a test document, confirm one row appears in BigQuery, and run a query that joins content metadata with your event table. Then connect Looker Studio, Looker, dbt, Hex, or another analytics tool to build reports for content performance, locale coverage, and campaign results.
How Sanity + BigQuery works
Build your BigQuery integration on Sanity
Sanity gives you the structured content foundation, real-time event system, and flexible APIs to connect content operations with BigQuery.
Start building free โCMS approaches to BigQuery
| Capability | Traditional CMS | Sanity |
|---|---|---|
| Analytics-ready content shape | Page HTML and plugin fields often need parsing before BigQuery can query them. | Typed JSON in the Content Lake maps to BigQuery rows, with references joined through GROQ before sync. |
| Change capture for warehouse sync | Exports are often nightly, plugin-based, or tied to full page publish events. | GROQ-filtered webhooks and Functions can run sync logic when published content changes. |
| Field-level control | Analytics fields may be mixed with theme settings, plugin data, or rendered markup. | GROQ projections select exactly the fields BigQuery needs, including joined reference data. |
| Content performance analysis | Teams often analyze by URL only, which hides author, topic, locale, and campaign context. | Schema-as-code keeps analytics fields versioned, reviewable, and consistent across content types. |
| High-volume sync strategy | Large exports can require database access, plugins, or custom ETL jobs. | Direct webhook sync works for moderate volume, while Pub/Sub or the BigQuery Storage Write API can handle higher traffic patterns. |
Keep building
Explore related integrations to complete your content stack.
Sanity + Google Analytics
Join GA4 behavior data with structured Sanity fields like topic, author, locale, and campaign in BigQuery.
Sanity + Segment
Send customer events through Segment and combine them with Sanity content metadata for audience and funnel analysis.
Sanity + Snowflake
Sync structured content into Snowflake when your analytics stack runs outside Google Cloud or spans multiple warehouses.