Webflow sync, pageviews & more.
NEW

How can I whitelist the IP or user agent of a third-party vendor to crawl my Webflow site for legal compliance archiving purposes?

TL;DR
  • Ensure your Webflow site is public, indexing is enabled, and robots.txt allows the vendor’s crawler.
  • Share the vendor’s User-Agent or IP with Webflow Enterprise Support for possible exceptions.
  • Advise the vendor to crawl slowly and respect robots.txt to avoid bot detection.
  • For stricter access control, set up an external reverse proxy with custom firewall rules.

To allow a third-party vendor to crawl your Webflow site (e.g., for legal compliance archiving), you need to give them access because Webflow doesn't support native IP or User-Agent whitelisting.

1. Understand Webflow's Hosting Firewall

  • Webflow is built on AWS CloudFront and uses a global CDN and firewall that you cannot configure manually.
  • You cannot whitelist specific IPs or User Agents directly within Webflow.
  • However, Webflow public sites are crawlable unless manually blocked via Webflow settings or robots.txt.

2. Ensure Your Site Is Public and Crawlable

  • Go to Project Settings > SEO tab.
  • Make sure “Disable Webflow subdomain indexing” is unchecked.
  • In the robots.txt editor, do not block the vendor's crawler. For example:
  • Avoid lines such as User-agent: * Disallow: /
  • Optionally add: User-agent: [VendorBot] Allow: / (replace [VendorBot] with actual bot name)

3. Share Custom User-Agent or IPs With Webflow Support

  • If the vendor has a known static IP address or unique User-Agent, contact Webflow Enterprise Support.
  • Webflow support may work on rare exceptions for enterprise teams to reduce false flagging or rate-limiting.

4. Avoid Blocking by Rate Limits or Bot Detection

  • Many third-party crawlers get blocked due to aggressive crawling behavior.
  • Advise the vendor to:
  • Throttle their crawl rate (e.g., 1 request/sec).
  • Use a custom User-Agent string that clearly identifies the crawler.
  • Respect robots.txt rules.

5. Use a Proxy or Middleware (Workaround for Enterprise Clients)

  • If whitelisting is non-negotiable, consider setting up a reverse proxy server outside Webflow that mirrors your site and allows whitelisting.
  • This works by:
  • Hosting the proxy on a separate server.
  • Serving the exact content of your Webflow site.
  • Applying custom firewall rules to the proxy server.
  • This requires developer resources and is typically used only in enterprise setups.

Summary

Webflow does not allow direct IP or User-Agent whitelisting, but sites are publicly crawlable by default. Ensure robots.txt and SEO settings allow access, and work with Webflow support if needed. For advanced control, use an external proxy solution.

Rate this answer

Other Webflow Questions