AI Search Optimisation
The 8 AI Crawlers You Should Be Allowing in 2026 (And How)
A definitive list of every AI crawler your website should explicitly welcome in 2026, who runs each one, and the exact robots.txt block to copy. Updated for the current AI search ecosystem.
If you've already read our piece on why your website is invisible to ChatGPT, you know the fix is in robots.txt. But which crawlers actually matter, and which ones are noise? This is the current shortlist — every AI crawler worth explicitly allowing in 2026.
The 8 AI crawlers to know
| Crawler | Owner | What it powers | Why it matters |
|---|---|---|---|
GPTBot |
OpenAI | ChatGPT training + indexing | The biggest AI surface today. Single largest source of AI citations. |
ChatGPT-User |
OpenAI | Live web fetches when ChatGPT is asked to visit a page | Different from GPTBot — used in real-time, not training |
OAI-SearchBot |
OpenAI | ChatGPT Search (the search-engine product) | Powers the answer-engine experience |
ClaudeBot |
Anthropic | Claude indexing + research | Growing fast in B2B and research-heavy verticals |
PerplexityBot |
Perplexity | Perplexity search + answer engine | Highest citation visibility — Perplexity always shows sources |
Google-Extended |
AI Overviews + Gemini training | Separate from Googlebot — controls AI answers without affecting normal SEO | |
Applebot-Extended |
Apple | Apple Intelligence + Siri | ~30% of AU mobile traffic uses iOS — Apple AI is the default |
Meta-ExternalAgent |
Meta | Llama training + Meta AI | Meta AI is integrated into Instagram, WhatsApp, Messenger search |
The honourable mentions
Worth knowing but lower priority — they're either smaller in volume or operate slightly differently:
Amazonbot— Amazon's general crawler, increasingly tied to Alexa AICCBot— Common Crawl, the open dataset that trains many smaller LLMsDuckAssistBot— DuckDuckGo's AI featuresMistralAI-User— Mistral's crawler (popular in EU markets)Bytespider— ByteDance / Doubao (TikTok parent), AI for the Chinese marketDiffbot— knowledge-graph data feeding many LLMs as a serviceYouBot— You.com's AI searchcohere-ai— Cohere's enterprise LLM training
The exact robots.txt to copy
Drop this into your robots.txt file at yourdomain.com/robots.txt. It explicitly welcomes every major AI crawler and keeps traditional search (Google, Bing) accessible:
Replace yourdomain.com on the last line with your actual domain. Save as plain text. Upload to the root of your site (same folder as your homepage).
Platform-specific instructions
- Squarespace: can't edit
robots.txtdirectly. Toggle every crawler ON under Settings → Crawlers & Spiders. Squarespace blocks the major AI bots by default — see our deeper guide on the Squarespace trap. - WordPress: use Rank Math or Yoast (both have a
robots.txteditor in their SEO settings) — or edit directly via your hosting file manager. - Wix: Settings → SEO Tools → Robots.txt. Paste the block above.
- Shopify: theme code →
robots.txt.liquid. Newer plans allow direct edits; older plans require theme code customisation. - Custom builds (static HTML, Vercel, Netlify): drop a file named
robots.txtinto your public directory. Done.
How to verify it worked
Once your file is live:
- Visit
yourdomain.com/robots.txtin an incognito window. Confirm the new content loads. - Use Google Search Console's robots.txt tester to validate parsing.
- Wait 2-4 weeks, then ask ChatGPT: "Tell me about [your business name]." Accurate, cited answer? You're in.
Robots.txt is necessary — but not sufficient. Allowing crawlers gets them to your door. To get cited, the content behind the door also needs schema markup, FAQ structure, and clear topical authority. Read our AI Search Optimisation Brisbane page for the full picture.
What if I don't want some of these crawlers?
You can selectively block crawlers if you have a real reason to. Common cases:
- Concerned about training data: block
GPTBot,ClaudeBot,Google-Extended,Meta-ExternalAgent. Note: this also blocks AI citation, not just training. - Heavy server load from a specific bot: block
BytespiderorCCBotfirst — these are the most aggressive crawlers. - Paywall content: use
Disallow: /premium/patterns rather than blanket-blocking the bot.
For most small businesses in 2026, the answer is allow everything. The trade-off is short-term invisibility on AI surfaces vs. theoretical training-data concern that doesn't really apply to a 5-page service site.
Updated when?
This list reflects the AI crawler ecosystem as of . The list is shifting fast — new players appear quarterly. We update this page when meaningful changes happen. Bookmark it.
Get a free AI crawler audit
We'll check your robots.txt, schema, and AI citation status — and send back a one-page report with the highest-impact fixes.