We have discussed before how to control Googlebot via robots.txt and meta robot tags. Both methods have limitations. With robots.txt you can block the crawling of any page or directory, but you cannot control the indexing, caching or snippets. With the robots meta tag you can control crawling, caching and snippets but you can only do that for HTML files, as the tag is embedded in the files themselves. You have no granular control for binary and non-HTML files.
Until now. Google recently introduced another clever solution to this problem. You can now specify robot meta tags via an HTTP header. The new header is the X-Robots-Tag, and it behaves and supports the same directives as the regular robots meta tag: index/noindex, archive/noarchive, snippet/nosnippet and the new unavailable_after directive. This new technique makes it possible to have granular control over crawling, caching, and other functions for any page on your website, no matter the type of content it has—PDF, Word doc, Excel file, zip files, etc. Read more