Bots have an active role in AEO too, just like search engine bots and good old robots.txt were all part of the Search Engine Optimization game. For example, Perplexity AI uses bots too for Real Time Information Retrieval. These bots searches the internet based on various rules and instructions to gather insights from top tier sources. This allows these answer engines to crawl, index and learn to provide current information while answering questions.

AE / LLM / OrgBotUser Agent / Bot name to use
AnthropicClaudeanthropic-ai
ByteDanceBytespiderBytespider
OpenAIChatGPTChatGPT-User
CohereCohere AIcohere-ai
GoogleGoogle-ExtendedGoogle-Extended
OpenAIGPTBotGPTBot
PerplexityPerplexityBotPerplexityBot

How to interact with an AEO / AI Answer Engine Bot

Below is a guide on how to interact with AI based answer engine bots, crawlers by its user agent. For example, perplexity mentions on their website that they will respect the robots file on your website and accordingly will instruct its crawler.

To identify – You can identify our web crawler by its user agent

User agent token: PerplexityBot
Full user agent: User-Agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)

To allow PerplexityBot to crawl your entire website, you would use the following code in your robots.txt file:

User-agent: PerplexityBot
Allow: /

To prevent PerplexityBot from crawling your website entirely, you would add this code to your robots.txt file:

User-agent: PerplexityBot
Disallow: /
  • Location of robots.txt: The robots.txt file should be placed in the root directory of your website (e.g., https://yourwebsite.com/robots.txt).
  • Partial Blocking: If you want to allow access to specific directories while blocking others, you can customize the rules accordingly. For example:
User-agent: PerplexityBot
Allow: /public/
Disallow: /private/

This setup allows PerplexityBot to access the /public/ directory while blocking it from the /private/ directory.

Always a good idea to head to the official doc files for reference as most of these answer engines and LLM orgs publish their crawlers and user agent details in their technical docs for developers to use. I am providing that information below for you to check out.

Additional Resources for further reading
  1. learn more about the open ai crawling and bots here
  2. learning more about the perplexitybot and the configuration data here
  3. learn more about the common crawl bot from here
  4. learn more about anthropic and the claudebot here

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *