Website

Echo's Website integration is a full-site crawler. You give it a domain or a localised URL; it discovers pages via the sitemap, renders each one through Cloudflare's headless browser, extracts the main content, and indexes everything to your assistant. Works on most static sites, JS-heavy SPAs, e-commerce platforms, and multi-locale shops.

On this page

Why use this

  • You have a public website with many pages and want all of them in your assistant's knowledge base.
  • The site doesn't run on WordPress or Framer — those have their own dedicated integrations.
  • You want JS-heavy pages (SPAs, React storefronts, dynamic catalogs) rendered properly, not just raw HTML.
  • Your shop is multi-locale (PrestaShop, Magento, Shopify locales) and you only want one language imported.

How it works

Three stages: discover, validate, crawl.

  1. Discover. Echo looks for a sitemap in this order: robots.txt's Sitemap: directive → /sitemap.xml/sitemap-index.xml/sitemap_index.xml. Both regular urlsets and sitemap-indices (nested sitemaps) are supported. Sitemap files with CDATA-wrapped URLs (common on PrestaShop and Magento) are parsed correctly.
  2. Validate. Echo counts the pages it discovered and checks your plan's content limit. If the import would exceed your limit, it errors out before crawling — no wasted work.
  3. Crawl. The URL list is handed to Cloudflare Browser Rendering, which fetches each page with a real headless browser, extracts the main markdown content with metadata (title, OG image, description), and returns it to Echo. Each page becomes one content item in your assistant.

The crawl runs asynchronously in the background. The integration card shows status (crawlingactive or error) with a notification bar at the top of the dashboard while it's running. Typical sites finish in 1–10 minutes; very large catalogs may take longer.

How it differs from related options

  • vs URL manual upload — URL indexes one page; Website crawls every page in the sitemap. URL uses a simpler AI extraction; Website uses Cloudflare's headless browser, which handles JS-heavy sites better. Pick URL for 1–5 pages, Website for a whole site.
  • vs WordPress — WordPress uses the REST API plus webhooks for real-time sync. Website is a polling/re-crawl model. If you have WordPress, prefer WordPress.
  • vs Framer — Echo auto-detects Framer sites even when submitted via the Website flow and silently switches to the Framer pipeline. You don't need to pick.

Step-by-step setup

  1. Open Integrations from the dashboard's Synced content sources section.
  2. Click Add source and select Website.
  3. Enter the website URL. Use the homepage (e.g. https://example.com) for a full-site crawl, or a path-prefixed URL (e.g. https://example.com/en) to narrow to a single locale.
  4. Select which assistant should receive the synced content.
  5. Submit. Echo validates the sitemap, checks your content limit, and starts the crawl in the background.

Watch the integration card status update as the crawl progresses. You can refresh content later by re-syncing the integration.

Multi-locale and large catalogs

If your site has a multi-language sitemap-index (one nested sitemap per locale), Echo can scope the crawl to a single locale.

  • Submit a path-prefixed URL — for example https://shop.example.com/en instead of the bare domain. Echo extracts the path prefix and uses it to filter the sitemap to URLs starting with /en.
  • Segment-aware matching — the filter respects path segments, so /en matches /en and /en/anything, but never /english-only.
  • Cloudflare side scoping — the same path prefix is passed to Cloudflare's includePatterns so the headless browser doesn't waste time on URLs it would discard.
  • Crawl budget — Echo automatically requests a slightly higher Cloudflare limit than the sitemap page count, to absorb URLs that the headless browser legitimately skips (404s, redirects, dedupe). You don't configure this; it's handled automatically.
  • Crawl verification — once the crawl finishes, Echo checks that the crawled page count is within tolerance of the sitemap page count (15% with a 10-page floor). Significant gaps trigger an error so you can investigate.
Fallback behaviour
If your path-prefix filter would eliminate every URL in the sitemap, Echo falls back to the unfiltered list rather than reporting "no pages found". Better to over-include than to surface a misleading empty result.

Common errors and troubleshooting

  • "No sitemap found" — Echo couldn't find a sitemap at any of the standard locations. Most CMSs and e-commerce platforms ship a sitemap by default; check that it's enabled and publicly readable.
  • "Sitemap found but it contains no pages" — the sitemap file parsed correctly but had zero <loc> entries. Check the sitemap's contents directly in your browser.
  • "Content limit exceeded" — the sitemap has more pages than your plan allows. Narrow to a locale with a path prefix, or upgrade your plan.
  • "Failed to validate website" — the homepage or sitemap timed out (10-second timeout per request). Try again, or check that your origin isn't blocking the EchoPlatform/1.0 user-agent.
  • "Crawled fewer pages than expected" — significant gap between sitemap count and crawled count, exceeding the 15% tolerance (10-page floor). Some pages failed to render — check those URLs individually in a browser.
  • "Crawl ended with status cancelled_due_to_timeout" — Cloudflare's crawler exceeded its time budget on a very large site. Narrow to a locale or split the crawl into smaller integrations.
  • "Crawl ended with status cancelled_due_to_limits" — the page count exceeded Cloudflare's hard cap. Narrow the scope with a path prefix.