Vol. 03 · Field Guideswallowtail · docsupdated 2026 · 05 · 12
No. 14b · Docs

Sources.

Five input modes, each with its own quirks. Mix freely between old and new side.

Sitemap URL

Paste the URL to sitemap.xml or sitemap_index.xml. We follow nested sitemap indexes up to three levels. Most marketing CMSes generate one at /sitemap.xml automatically.

Sitemap file

Upload an XML file directly. Useful when the source site is no longer reachable or sits behind a firewall. Files up to 50 MB.

GSC export

Pull the Performance report from Google Search Console as CSV and upload. Captures pages Google has actually indexed, including ones missing from the official sitemap.

Plain text

One URL per line. Comments with #. Drop a .txt file or paste directly.

Live crawl

Give us a domain and we crawl it under the configured depth and concurrency caps, honoring robots.txt. Best when no sitemap exists or it's stale.

Crawl options

  • Max depth. Default 4. Tune up for deep blog archives.
  • Max URLs. Default 2,000. Hard cap at the project's URL limit per plan.
  • Concurrency. Default 8 parallel fetches. Reduce for delicate hosts.
  • robots.txt. Respected by default. Disabling requires a confirmation.