Have Google recently changed how they index your site?
Welcome to Hell…
I subscribe to a number of search related sites and I’ve recently noticed an increase in the number of articles being written about Googles supplemental index. According to Google, being in the supplemental index shouldn’t be seen as some kind of penalty but try telling that to anyone who is hoping to rank well for a competitive keyword.
With all the new articles being written, the continued focus on the supplemental index in all of the forums and the collective hand wringing of webmasters who have seen their sites disappear off the Big G radar, it begs the question:
Have Google changed how they classify what gets thrown onto the supplemental scrapheap?
It’s impossible to get a definitive answer on exactly how many of your pages are actually consigned to the supplemental index. Google Webmaster Tools don’t offer this service so the only way to find out is by running a couple of basic queries through Google.
This one lists the total number of pages in the Google index.
site:www.yourdomain.com -view ***
This one returns all of your pages that are marked as being in the supplemental index.
Neither of these are particularly scientific nor particularly accurate but in the absence of anything else, they’ll have to do. Once you have the data, you can figure out what percentage of your site is sitting in
naughty boys corner the supplemental index by using this formula:
(100/total number of pages) * number of supplemental results
So to use this site as an example, Google currently list 243 pages of which 161 are in the supplemental index so 100/243=0.41 multiplied by 161 gives 66% meaning that two thirds of my content, to quote Andy Hagans, is sitting in the Siberian work camp for web pages.
Now the last time I looked into this, less than 30% of my site was sitting in the supplemental index so how come the big jump?
I had a look at some of my other sites to see how they stacked up and most of them were sitting around 5-10% because I’d created them using the techniques I looked at previously but it was interesting to see a couple of my older sites which used old style techniques (keyword stuffing, generic meta data, hidden text, alt tags loaded with keywords) were only sitting around 20-25%. Curious. If any of my sites should be penalised, these guys would be top of the list.
However all of my WordPress sites were sitting at 50%+. The answer to this according to many people is because out of the box WordPress isn’t very search engine friendly. It creates large quantities of duplicate content due to the way in which it puts your articles into a number of different places (i.e. on your front page, in category directories, in date directories, in your feed, etc.) but I’m not convinced that’s the sole reason. Sure, there will be perceived duplicate content issues with this site because I wasn’t so aware of the problem when I started out but I think it has more to do with Big G themselves. I think they’ve made some changes to the way the categorise the supplemental index.
As I mentioned the other day, Matt Cutts currently has 43% of his site sitting in supplementals. Some of the other sites I read also have an issue – SEOmoz (28%), Stuntdubl(37%), Threadwatch(77%), Problogger(59%) – and I’m sure that this will probably come as something of a surprise to them; I just hope they don’t mind me publishing the figures.
Now these guys have got the advantage of being existing brands so they’re not going to be as affected as I am by the volume of pages that Big G has deemed to be supplemental to requirements. However, in the case of the first three examples, these are major SEO players who would have built their sites based around the fundamentals of SEO so how come they’re copping the flak? Surely something has changed somewhere.
To go from less than 30% to 66% in the space of 3 months just doesn’t feel right. I’ve actually gone out of my way to avoid duplicate content in that time as well by only applying single categories to posts (thanks for the tip, Graywolf), by creating a robots.txt file to prevent Google indexing certain areas of the site and by generally trying to be smarter with the way I use WordPress so this is why I think it’s a change with Google.
Has anyone else noticed a big hike in the number of their pages consigned to the supplemental index? Is it just on your blog or is it sitewide?