Here’s how you can allow different search engine bot crawlers if you prefer to address them individually. There might be some crazy reasons why which I try to explain below. For some companies, it seems web developers often don’t dev to create new web assets, they dev to squeeze any remaining SEO juice from their old assets. Always diminishing returns when you measure the opportunity cost of not spending time and resources moving forward, but instead looking back.
SetEnvIfNoCase User-Agent .*bing.* search_robot SetEnvIfNoCase User-Agent .*google.* search_robot SetEnvIfNoCase User-Agent .*yahoo.* search_robot SetEnvIfNoCase User-Agent .*bot.* search_robot SetEnvIfNoCase User-Agent .*ask.* search_robot Order Deny,Allow Deny from All Allow from env=search_robot
Here are some more .htaccess SetEnvIf & SetEnvIfNoCase Examples from Apache’s website.
PHP Logic for detecting different search engine crawlers
You may want to redirect your content to be customized for different search engine bots to repair certain SEO issues that you may encounter.
Here is the Search Engine Directory of Spider Names
if(stristr($_SERVER['HTTP_USER_AGENT'], "googlebot")){ // what to do -- change "googlebot" to other spiders in list }
For certain instances where some how a server that was supposed to be locked down was inadvertently crawled by a search engine, you don’t want to open your entire site for all crawlers, here’s a way you can open your site to confirm your site ownership file by the crawler and perhaps disavow content if you don’t have anything better to do than massage your site’s SEO and squeeze every drop of juice from it.
<html> <head> <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> <title>Web Development is a finite resource</title></head> <body> Does quality content matter any more? </body> </html>