Tuesday, June 10, 2008

The AdSense Pub(lisher) Crawl

It seems like every time I visit our AdSense colleagues in the Dublin office, I get invited out to celebrate a birthday or a promotion with that great Dublin tradition: the pub crawl. Today I'd like to dedicate a few words to another pub crawl. (I can hear your groans throughout the blogosphere.) That's right, I'm talking about the AdSense "publisher" crawl.

As you may know, it's important to allow the AdSense crawler access to the pages that display your ads. If our crawler can't see the content of your pages, your ad targeting may suffer, and with it your earnings. It's also important that we hold all pages to the same policy standards, and we may eventually stop serving ads to pages that the crawler can't access. With this in mind, I'd like to ask you two questions highlighting potential roadblocks to a successful AdSense crawl and let you know what you can do to correct them.

1. Are you using a robots.txt file on a site with Google ads?
If so, you might be inadvertently blocking the AdSense crawler from accessing parts of your site. If you aren't sure what a robots.txt file is, it's a text file that you include on your domain that allows you to block crawlers from accessing your site. You can find out if you're using a robots.txt file by going to example.com/robots.txt (replace 'example.com' with your own domain name) or by using Site Diagnostics. If you do use a robots.txt file to block certain crawlers from accessing your site, it's a good idea to add an explicit invitation to the AdSense crawler so it knows it's welcome to visit any page with AdSense code. Please keep in mind that the AdSense crawler is separate from the Google bot for our search index.

To give the AdSense crawler access, add these two lines to your robots.txt file:

User-agent: Mediapartners-Google*
Disallow:

You can use the Site Diagnostics link in your AdSense Reports tab to see whether we're having trouble crawling any pages on your sites. If you're concerned about the privacy of some pages on your site, keep in mind that we don't publish any of the information retrieved by the Mediapartners-Google crawler, also known as the AdSense crawler, in any index, and it will only crawl pages of your site which contain the AdSense code.
2. Are your pages restricted by a login?
Our crawler will also get tripped up by any page that's only accessible to a logged-in user. If certain pages of your site are only available to users that have logged in, and you place ads on these pages, it's important to give the Mediapartners-Google crawler explicit access to view them too. In this case, the answer is site authentication, which you can find under your AdSense Setup tab. (Please note that you'll need to be migrated to Google Accounts to use this feature.) You can give our crawler access while continuing to prevent other users or bots from accessing the content on your site.
While using the Site Diagnostics tool, you may notice sites that are blocked for other reasons -- please review our Help Center for more information about why your site may be showing up as blocked. By allowing our crawler access to pages hosting Google ads, you'll get the most targeted ads for your pages in return. I think we can all toast to that.

Updated example site

No comments:

Post a Comment