Bots have an active role in AEO too, just like search engine bots and good old robots.txt were all part of the Search Engine Optimization game. For example, Perplexity AI uses bots too for Real Time Information Retrieval. These bots searches the internet based on various rules and instructions to gather insights from top tier sources. This allows these answer engines to crawl, index and learn to provide current information while answering questions.
AE / LLM / Org | Bot | User Agent / Bot name to use |
---|---|---|
Anthropic | Claude | anthropic-ai |
ByteDance | Bytespider | Bytespider |
OpenAI | ChatGPT | ChatGPT-User |
Cohere | Cohere AI | cohere-ai |
Google-Extended | Google-Extended | |
OpenAI | GPTBot | GPTBot |
Perplexity | PerplexityBot | PerplexityBot |
How to interact with an AEO / AI Answer Engine Bot
Below is a guide on how to interact with AI based answer engine bots, crawlers by its user agent. For example, perplexity mentions on their website that they will respect the robots file on your website and accordingly will instruct its crawler.
To identify – You can identify our web crawler by its user agent
User agent token: PerplexityBot
Full user agent: User-Agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)
To allow PerplexityBot to crawl your entire website, you would use the following code in your robots.txt
file:
User-agent: PerplexityBot
Allow: /
To prevent PerplexityBot from crawling your website entirely, you would add this code to your robots.txt
file:
User-agent: PerplexityBot
Disallow: /
- Location of
robots.txt
: Therobots.txt
file should be placed in the root directory of your website (e.g.,https://yourwebsite.com/robots.txt
). - Partial Blocking: If you want to allow access to specific directories while blocking others, you can customize the rules accordingly. For example:
User-agent: PerplexityBot
Allow: /public/
Disallow: /private/
This setup allows PerplexityBot to access the /public/
directory while blocking it from the /private/
directory.
Always a good idea to head to the official doc files for reference as most of these answer engines and LLM orgs publish their crawlers and user agent details in their technical docs for developers to use. I am providing that information below for you to check out.
Additional Resources for further reading