July 2008

Monthly Archive

Duplicate Content is Google’s Weak Link

Grizzly 16 Jul 2008 | : Make Money Online

Many of you are familiar with content scrapers. These people steal your work and post your content on their own sites. Some link back to the original source but many do not. Google maintains that we have little to worry about. In an article on Google’s Webmaster Central Blog in June, Sven Naumann of Google’s Search Quality Team made the following statements.

I’d like to briefly touch on a concern webmasters often voice: in most cases a webmaster has no influence on third parties that scrape and redistribute content without the webmaster’s consent. We realize that this is not the fault of the affected webmaster, which in turn means that identical content showing up on several sites in itself is not inherently regarded as a violation of our webmaster guidelines. This simply leads to further processes with the intent of determining the original source of the content—something Google is quite good at, as in most cases the original content can be correctly identified, resulting in no negative effects for the site that originated the content.

He maintains the following,

Generally, we can differentiate between two major scenarios for issues related to duplicate content:

  • Within-your-domain-duplicate-content, i.e. identical content which (often unintentionally) appears in more than one place on your site
  • Cross-domain-duplicate-content, i.e. identical content of your site which appears (again, often unintentionally) on different external sites

He tells us that the first scenario can be avoided by including the preferred version of your URLs in your Sitemap file. When encountering different pages with the same content, this may help raise the likelihood of Google serving the version you prefer.

The second scenario is the one most of us are concerned with.

In the second scenario, you might have the case of someone scraping your content to put it on a different site, often to try to monetize it. It’s also common for many web proxies to index parts of sites which have been accessed through the proxy. When encountering such duplicate content on different sites, we look at various signals to determine which site is the original one, which usually works very well. This also means that you shouldn’t be very concerned about seeing negative effects on your site’s presence on Google if you notice someone scraping your content.

From my own experience I am going to tell you that this is horseshit.

The fact is that any site that has more PR and Authority can post all the duplicate content it wants and out rank the original source. How do I know this? For the past 6 months I have been scraping content off of GoArticles for several niche blogs that I had originally built up to PR3 sites with original content. I began scraping snippets from GoArticles and now have several hundred pages of scraped content on these blogs. So what happened to my rankings? 1 Blog is now number 1 in the serp’s for its primary keyword. Two others are on page 1 and three more are hovering between page 1 and 2 in the serp’s. Not a single scraped article ranks lower than the original source - not one. In fact most of the original articles aren’t even found on the serp’s.

What is more interesting is that I have been able to scrape content from some higher PR sites and still outrank the original source with a low PR blog simply by accumulating several decent PR keyword optimized backlinks.

This is not a call for you to start scraping content. I simply did this to find out once and for all if Google really can detect and more importantly weed out scrapers from the index. The fact is that they can’t. When they encounter duplicate content they simply resort to who has the greater authority and the most keyword targeted backlinks. The only time they seem to get it right is when the scraper leaves in links that point back to the original source.

So what recourse do you have if a scraper is stealing your content and outranks you? Technically Google allows you to report these sites in webmaster tools. Go ahead but don’t expect much. I have reported several sites that regularly steal my MMO content. Has Google de-listed any of them? Nope. The good news is that none of these scrapers outrank me but they are still listed in the serp’s.

Note: RT over at Untwisted Vortex has a good article discussing some of the things you can do with scrapers. See… Defeating Bad Scrapers the Free and Easy Way

The end result is that Google has been putting on a front with regards to duplicate content and the fact is that they do not have a means to weed it out with an automated system. They can only do it manually and they simply don’t have the manpower to do so. An automated system would not be feasible as they would lose far too many top earners like the news organizations who scrape content as a rule. Google doesn’t have an answer aside from maintaining an exaggerated account of what they are capable of. Make people think they will be penalized and hopefully this will stop them from scraping content. Have you noticed that scrapers are becoming ever more prevalent? Seems more and more people are calling Google’s bluff.

Next Page »