AI Crawlers
Specialised web crawlers operated by AI platforms — including GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended, and PerplexityBot — that index web content for LLM training and AI search.
Major AI Crawlers
| Crawler | Operator | Primary Use | User-agent string |
|---|---|---|---|
| GPTBot | OpenAI | Training + ChatGPT Search | GPTBot |
| OAI-SearchBot | OpenAI | ChatGPT Search retrieval | OAI-SearchBot |
| ClaudeBot | Anthropic | Training + Claude browsing | ClaudeBot |
| Google-Extended | AI Overviews + Gemini | Google-Extended | |
| PerplexityBot | Perplexity | Perplexity search | PerplexityBot |
Note that OpenAI operates two separate crawlers: GPTBot for training data collection and OAI-SearchBot specifically for ChatGPT Search retrieval. Blocking GPTBot does not block OAI-SearchBot, and vice versa. Brands wishing to opt out of training data while preserving search visibility should block GPTBot but allow OAI-SearchBot.
robots.txt Configuration
To allow all AI crawlers, add the following to your robots.txt file:
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: PerplexityBot
Allow: /
To block specific crawlers (e.g., for training data opt-out while keeping search access), use Disallow: / for the training bot and Allow: / for the search bot where they are separate user agents. Many brands block GPTBot (training) while explicitly allowing OAI-SearchBot (search) — a valid configuration that preserves ChatGPT Search visibility while opting out of training data use.
Frequently Asked Questions
- Should I block AI crawlers to protect my content?
- Blocking AI crawlers prevents your content from appearing in AI-generated answers. For most brands, this is harmful — it removes them from consideration when AI systems answer questions in their category. Only block specific crawlers if you have a legal or business reason to opt out of LLM training datasets specifically.
- How do I check which AI crawlers are currently blocked on my site?
- Check your robots.txt file (yourdomain.com/robots.txt) for Disallow directives against GPTBot, ClaudeBot, Google-Extended, and PerplexityBot. An AnswerAtlas AI Visibility Audit checks this automatically as part of Step 1.
- Do AI crawlers respect robots.txt?
- Major AI crawlers — GPTBot, ClaudeBot, Google-Extended — state that they respect robots.txt. PerplexityBot also honours robots.txt directives. Compliance among smaller or newer AI platforms is less consistent, though most major platforms publicly commit to honouring robots.txt opt-outs.
Want to know how your brand appears in AI answers?
Run an AnswerAtlas AI Visibility Audit and see whether ChatGPT, Claude, Gemini, and Google AI results mention your brand or your competitors.