The robots.txt file is used to prevent cooperating web crawlers from accessing certain directories and files. The file plays a major part in search engine optimization and website performance. Drupal ships with a standard robots.txt file that prevents web crawlers from crawling specific directories and files. If you’re curious just open up the robots.txt file within the Drupal root directory.
If you have to modify the robots.txt file within a multi-site setup for a specific site, this is where things start to get tricky because the file is shared across all websites. The solution is to use the RobotsTxt module. The module dynamically generates a robot.txt file that can be modified directly from the Drupal administration section for each site within a multi-site setup.
In this article, I’ll show you how to setup the module.
Setup RobotsTxt Module
1. Setting up the module is very easy, just download and enable.
2. Remove or rename the existing robots.txt file.
3. Go to Configuration -> RobotsTxt (admin/config/search/robotstxt). If you have not removed or renamed the existing robots.txt you will see this error message.
4. Add your changes to the “Contents of robots.txt” text area and you’re good to go.
Export Configuration
If you need to export the robots.txt text into a Features module, simply use Strongarm to export the robotstxt
variable. Add features[variable][] = robotstxt
to your .info
file and update the module.
RobotsTxt API
The module has a single hook called hook_robotstxt()
. The hook allows you to define extra directives via code. This is useful if you want to implement a shared list of directives across all multi-sites.
The example below will add a Disallow
for /custom-search
and /custom-listing
to the bottom of the robots.txt file without having to add them manually to the “Contents of robots.txt” text area.
/** * Implements hook_robotstxt(). */ function hook_robotstxt() { return array( 'Disallow: /custom-search', 'Disallow: /custom-listing', ); } ?>
If you have any questions, please leave a comment.
This is a great module to have, but having to remove the robots.txt file every time you build a site can get awkward especially in the context of automated site builds with build tools like drush make. There are a few patches in the robotstxt issue queue that help work around this.
Thanks for the insight.