Cleaning up the Technorati bloat

Technorati Search - eightface.com

Technorati is sluggish — been that way as long as I can remember. It’s always managed to find a few links that others services haven’t. To be fair, other services produce results that Technorati doesn’t, so it’s a bit of a mixed bag. Kottke wrote about it, so everyone else needs to weigh in (it’s kind of fun using the service to track people bitching about it).

One of the site’s core problems is bloat. You think it would be obvious that size doesn’t matter when it comes to indexing weblogs. Quality, not quantity. Take a look at Technorati, they whip out their big numbers and slap ’em down on the top of the page. Over 16.1 millions blogs and 1.4 billion posts indexed. Pretty impressive eh? I bet that earns them all an extra glance at all the cool nerd parties.

Now, let’s try to come up with a solution that reduces the number of sites in the index, speeds up searches and returns more valuable results. Sounds like a bit of a tough one. So, lets take a look at a diagram (actually a screen capture, but diagram sounds better). Do you see any big problems? Maybe BlogSpot? Five of the last seven are AdWords abuse/spam sites. I’m not sure how valuable 16.1 million weblogs are when you have crap like this around.

Don’t get me wrong, I have friends that use BlogSpot, it’s a nice easy way to get started online and maintain a simple weblog. Something has to change though. I’m sure Google is aware of the problem, it’s probably screwing up their index too.

How do we fix it? Technically, it’s a problem on Google’s end — they own Blogger and should do something about so many fake sign-ups. In the short-term, Technorati could remove BlogSpot from the index, it’s a bit of a blanket solution but it could help speed things up . Although, it probably wouldn’t help them in terms of Google buying out the service. On the other hand, Yahoo might appreciate the damage to its rival. It doesn’t even have to be a full-out ban on whatever.blogspot addresses, maybe just have a holding period or a number of links to their site from non-BlogSpot addresses.