How to Integrate Screaming Frog with Your Headless CMS
Connect Screaming Frog to structured content so every publish can trigger targeted crawls, catch SEO regressions, and help teams fix issues before they reach search results.
What is Screaming Frog?
Screaming Frog is a desktop-based SEO crawler used by technical SEO teams, agencies, and site owners to audit URLs, metadata, redirects, canonicals, status codes, hreflang, structured data, and more. Its SEO Spider is widely used for technical audits because it can crawl small sites, large sites, staging environments, and URL lists with configurable extraction and export settings.
Why integrate Screaming Frog with a headless CMS?
SEO issues usually show up after content changes. An editor updates a title, a developer changes a route, or a localization team publishes 400 translated pages, and nobody notices that 37 pages now have missing canonicals or duplicate H1s until the next scheduled audit. Connecting Screaming Frog to your content workflow lets you run focused crawls when content changes, instead of waiting for a monthly site-wide scan.
Architecture overview
A common flow starts when an editor publishes or updates a document in Sanity Studio. A Sanity webhook fires on the publish mutation, filtered with GROQ so only SEO-relevant document types, such as page, article, product, or landingPage, trigger the workflow. The webhook calls a Sanity Function or a small middleware endpoint. That code uses @sanity/client and a GROQ query to fetch the changed document, join referenced fields, and build the exact public URL or a short URL list for nearby pages that should be checked. Because Screaming Frog SEO Spider does not expose a hosted REST API, the middleware calls the installed SEO Spider command-line binary, for example screamingfrogseospider, with headless list-mode crawl options. The runner writes a urls.txt file, starts the crawl, exports tabs such as Internal:All, Page Titles:Missing, Meta Description:Duplicate, H1:Missing, Response Codes:Client Error (4xx), and Canonicals:Missing, then saves the CSV output to a shared location or parses it into your reporting system. The SEO team reviews the issues, fixes the source fields in Sanity Studio, and the published site updates for visitors and search crawlers.
Common use cases
Crawl new pages after publish
Run a Screaming Frog list-mode crawl for newly published URLs and catch missing titles, 404s, noindex tags, and canonical issues within minutes.
Check localized URL sets
When a market publishes translated pages, crawl the locale-specific URLs and export hreflang, status code, and canonical reports for that language.
Audit redirects after slug changes
Use Sanity webhook events to detect slug updates, crawl old and new URLs, and verify 301 behavior before search engines recrawl the page.
Validate metadata at scale
Compare structured title, description, Open Graph, and canonical fields in Sanity against what Screaming Frog finds in rendered HTML.
Step-by-step integration
- 1
Install and license Screaming Frog SEO Spider
Install SEO Spider on the machine that will run crawls, activate a paid license if you need more than the free crawl limits, and confirm the command-line binary works with a test command such as screamingfrogseospider --headless --crawl https://example.com.
- 2
Create a reusable Screaming Frog configuration
In the SEO Spider app, configure crawl settings, rendering mode, user agent, authentication if needed, custom extraction, and PageSpeed Insights settings if you use that API. Save the configuration as a .seospiderconfig file for the runner.
- 3
Model SEO fields in Sanity Studio
Add fields such as slug, seoTitle, seoDescription, canonicalUrl, noindex, language, market, and parent references to the relevant schemas. This gives your crawl workflow typed source data instead of relying on HTML scraping.
- 4
Add a publish webhook or Sanity Function trigger
Create a webhook that fires on publish events for page-like documents. Use a GROQ filter such as _type in ['page','article','product'] so crawl automation does not run for unrelated content changes.
- 5
Run Screaming Frog from a crawl runner
Because Screaming Frog does not provide a hosted API or official JavaScript SDK, call the SEO Spider command-line interface from a licensed runner. The runner can receive the Sanity webhook, fetch the changed URL with @sanity/client, write a URL list, and start a headless crawl.
- 6
Test the full loop
Publish a test page, confirm the webhook fires, verify the generated URL list, inspect the exported Screaming Frog CSV files, and decide where issues go next, such as Slack, Jira, GitHub, or a custom Sanity Studio dashboard.
Code example
import {createClient} from '@sanity/client';
import {writeFile, mkdir} from 'node:fs/promises';
import {execFile} from 'node:child_process';
import {promisify} from 'node:util';
const exec = promisify(execFile);
const client = createClient({
projectId: process.env.SANITY_PROJECT_ID!,
dataset: process.env.SANITY_DATASET!,
apiVersion: '2025-01-01',
token: process.env.SANITY_READ_TOKEN,
useCdn: false
});
export async function handleWebhook(req: any, res: any) {
const {_id} = req.body;
const page = await client.fetch(
`*[_id == $id][0]{"url": "https://www.example.com/" + slug.current}`,
{id: _id.replace('drafts.', '')}
);
if (!page?.url) return res.status(204).end();
await mkdir('/tmp/sf', {recursive: true});
await writeFile('/tmp/sf/urls.txt', page.url + '
');
await exec('screamingfrogseospider', [
'--headless',
'--crawl-list', '/tmp/sf/urls.txt',
'--config', '/opt/screamingfrog/sanity.seospiderconfig',
'--output-folder', '/tmp/sf/out',
'--export-tabs', 'Internal:All,Page Titles:Missing,Meta Description:Missing,Response Codes:Client Error (4xx)'
]);
res.json({crawled: page.url});
}How Sanity + Screaming Frog works
Build your Screaming Frog integration on Sanity
Sanity gives you the structured content foundation, real-time event system, and flexible APIs to connect publish workflows with Screaming Frog audits.
Start building free →CMS approaches to Screaming Frog
| Capability | Traditional CMS | Sanity |
|---|---|---|
| Generating crawl URL lists | Often requires sitemap scraping, database exports, or plugins that vary by site setup. | Uses GROQ to query published URLs, locale variants, parent references, and SEO fields directly from the Content Lake. |
| Triggering crawls on publish | Usually depends on plugin hooks or scheduled jobs, which can miss custom publish flows. | Uses GROQ-powered webhooks to trigger only for relevant document types, publish events, or field changes. |
| Running Screaming Frog automation | Typically needs an external script that polls the site or waits for manual URL exports. | Functions can handle lightweight event processing, while a licensed crawl runner executes the Screaming Frog command-line workflow. |
| Comparing source fields to rendered HTML | SEO teams often compare spreadsheets against pages manually. | A single GROQ query can return title, description, canonical, noindex, language, and related content for comparison against crawl exports. |
| Handling multi-market SEO checks | Locale rules are often embedded in templates, plugins, or separate site instances. | Schemas can model markets, languages, routes, and hreflang relationships so each crawl targets the right regional URL set. |
Keep building
Explore related integrations to complete your content stack.
Sanity + Google Search Console
Connect search performance data with structured content so teams can find pages with impressions, clicks, and content gaps.
Sanity + Ahrefs
Pair backlink and keyword research with Sanity content fields to prioritize updates by page, topic, and market.
Sanity + Semrush
Use Semrush keyword and audit data alongside Sanity schemas to plan SEO updates across pages, campaigns, and locales.