Are AdSense publishers being favored with more frequent indexing?

June 21, 2006 at 6:15 am Leave a comment

Today I was going to address some of the comments that Stu Drew left about managing to get a high ranking for his private-label rights articles blog entry, but I’m going to defer that to a later time. If you’re interested in that topic, let me point you to an article I’ve written about the so-called “Google Sandbox” that should address some of the questions: Redcowl Bluesingksy: Why the Google Sandbox Doesn’t Exist.

I want to talk some more about Google’s indexing of AdSense pages. In case you hadn’t heard, Googler Matt Cutts confirmed that the AdSense crawler is feeding pages into Google’s new “BigDaddy” search indexes. This confirms what others had noticed about what the AdSense crawler (usually referred to as the “mediabot”) is doing. Or does it?

As always, there are different ways to look at what’s happening. We know that pages crawled by the mediabot are now making their way into the Google search index. What we don’t know, however, is whether those pages are being pushed or pulled into the index. Let me explain.

Let’s think of the innards of the Google search engine as a bunch of black boxes. (Disclaimer: I have no special knowledge of how things actually work internally.) For our purposes, we’re only concerned with three of those boxes:

  1. The manager maintains a list of URLs and decides when each need to be indexed
  2. The crawler (this is the Googlebot) goes out and fetches pages for indexing
  3. The indexer takes crawled pages and indexes and ranks them using proprietary algorithms

At some point, the manager decides that a given URL needs to be recrawled. It decides this based on age, Google Sitemaps, PageRank, whatever. No one disputes that different sites get crawled with different frequencies, and the manager is the one making those decisions. So it tells the crawler to fetch the page. This won’t happen for a while, but when it’s done the crawler tells the manager the page has been fetched and the manager then passes the page to the indexer for processing.

Now throw the AdSense crawler into the mix and see what happens. The case that concerns the SEO community is if the mediabot pushes its pages directly to the indexer, bypassing the manager’s controls. In this scenario, changes to AdSense pages can potentially be noticed much more quickly than they would through the normal crawling process, giving them an unfair advantage. In this “push” model, the AdSense crawler effectively acts as a secondary manager.

The “pull” model, on the other hand, only affects the crawler. When the manager asks the crawler to get the contents of a given URL, the crawler first checks with the mediabot to see if the latter has crawled the page recently, where “recently” can be any reasonable length of time, say 24 hours. If it does, the crawler just returns a copy of what the mediabot saw instead of going out to fetch the page contents again. The manager is still in control in this scenario — only it decides when a page is to be crawled.

What I’ve been assuming is that Google is using the pull model, not the push model. Others are assuming the reverse (and the worst), hence the controversy. We need someone from Google to clarify this issue for us…

Originally from An AdSense Blog: Make Easy Money with Google on April 19, 2006, 11:11am

Ads by Yahoo!


Entry filed under: Misc, SEO.

How gzip encoding reduces bandwidth How to get a #1 ranking in Google

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


June 2006
« Apr    

Most Recent Posts

%d bloggers like this: