Skip to main content

Bot & AI Protection

Aegis provides per-host bot and AI protection that combines WAF-based scanner detection, AI crawler blocking, dynamic robots.txt generation, and the Aegis Shield challenge system. All bot protection settings are configured per proxy host: Admin UI -> Hosts -> edit a proxy host -> Bot Protection.

Protection Layers

LayerWhat It Does
Block Known BotsWAF-based scanner detection (sqlmap, Nikto, Nmap, Burp Suite, Nuclei, WPScan, etc.)
Block AI CrawlersUser-Agent filtering for known AI training and scraping bots
Robots.txtDynamic robots.txt generation with EU Directive 2019/790 compliance and content-signal directives
Aegis ShieldProof-of-work + browser verification challenge (see Aegis Shield)

AI Crawler Blocking

When enabled, Aegis blocks requests from the following AI crawler User-Agents:
BotOperator
GPTBotOpenAI
ChatGPT-UserOpenAI
OAI-SearchBotOpenAI
ClaudeBotAnthropic
Claude-SearchBotAnthropic
BytespiderByteDance
Google-ExtendedGoogle
AmazonbotAmazon
Applebot-ExtendedApple
CCBotCommon Crawl
cohere-aiCohere
DiffbotDiffbot
FacebookBotMeta
meta-externalagentMeta
ImagesiftBotImagesift
PerplexityBotPerplexity
YouBotYou.com
omgiliOmgili

Robots.txt Generation

Aegis can dynamically serve a robots.txt file for each protected host. The generated file includes:

Content-Signal Directives

These directives communicate content usage preferences to compliant crawlers:
DirectiveDescription
Content-Signal: searchAllow use of content in search results
Content-Signal: ai-inputAllow use of content as AI model input
Content-Signal: ai-trainAllow use of content for AI model training
Each signal can be independently enabled or disabled per host.

Configuration Options

SettingDescription
Enable robots.txtServe a dynamically generated robots.txt
Allow SearchInclude Content-Signal: search
Allow AI InputInclude Content-Signal: ai-input
Allow AI TrainingInclude Content-Signal: ai-train
Block AI CrawlersAdd Disallow: / rules for each known AI crawler bot
Block All BotsBlock all bots except those explicitly allowed
Disallow PathsPaths to disallow for all bots
Allow PathsPaths to explicitly allow
Sitemap URLSitemap location to include in robots.txt
Custom RulesAdditional raw robots.txt directives
The generated robots.txt includes a header referencing EU Directive 2019/790 for content rights signaling compliance.

Per-Host Configuration

Each proxy host has its own independent bot protection configuration stored as JSON:
SettingTypeDescription
enabledbooleanMaster toggle for all bot protection
blockKnownBotsbooleanEnable WAF scanner detection rules
blockAICrawlersbooleanBlock known AI crawler User-Agents
robotsTxtobjectRobots.txt generation settings
shieldChallengebooleanEnable Aegis Shield challenge
shieldModestringmanaged, invisible, or interactive
shieldDifficultynumberProof-of-work difficulty (1-5)
shieldCookieTTLMinutesnumberHow long the pass cookie is valid
shieldExemptPathsstringPaths that bypass the challenge
shieldExemptCIDRsstringIP ranges that bypass the challenge
shieldExemptUserAgentsstringUser-Agents that bypass the challenge

Shield Difficulty Levels

LevelApproximate TimeDescription
1~50msLow
2~200msMedium-Low
3~500ms–1sMedium (default)
4~2–5sHigh
5~10–30sMaximum

Configuration Locations

  • Per-host bot settings: Admin UI -> Hosts -> edit host -> Bot Protection
  • Global bot config: Admin UI -> Config -> Bots