Duplicate Content is Google’s Weak Link

Many of you are familiar with content scrapers. These people steal your work and post your content on their own sites. Some link back to the original source but many do not. Google maintains that we have little to worry about. In an article on Google’s Webmaster Central Blog in June, Sven Naumann of Google’s Search Quality Team made the following statements.

I’d like to briefly touch on a concern webmasters often voice: in most cases a webmaster has no influence on third parties that scrape and redistribute content without the webmaster’s consent. We realize that this is not the fault of the affected webmaster, which in turn means that identical content showing up on several sites in itself is not inherently regarded as a violation of our webmaster guidelines. This simply leads to further processes with the intent of determining the original source of the content—something Google is quite good at, as in most cases the original content can be correctly identified, resulting in no negative effects for the site that originated the content.

He maintains the following,

Generally, we can differentiate between two major scenarios for issues related to duplicate content:

  • Within-your-domain-duplicate-content, i.e. identical content which (often unintentionally) appears in more than one place on your site
  • Cross-domain-duplicate-content, i.e. identical content of your site which appears (again, often unintentionally) on different external sites

He tells us that the first scenario can be avoided by including the preferred version of your URLs in your Sitemap file. When encountering different pages with the same content, this may help raise the likelihood of Google serving the version you prefer.

The second scenario is the one most of us are concerned with.

In the second scenario, you might have the case of someone scraping your content to put it on a different site, often to try to monetize it. It’s also common for many web proxies to index parts of sites which have been accessed through the proxy. When encountering such duplicate content on different sites, we look at various signals to determine which site is the original one, which usually works very well. This also means that you shouldn’t be very concerned about seeing negative effects on your site’s presence on Google if you notice someone scraping your content.

From my own experience I am going to tell you that this is horseshit.

The fact is that any site that has more PR and Authority can post all the duplicate content it wants and out rank the original source. How do I know this? For the past 6 months I have been scraping content off of GoArticles for several niche blogs that I had originally built up to PR3 sites with original content. I began scraping snippets from GoArticles and now have several hundred pages of scraped content on these blogs. So what happened to my rankings? 1 Blog is now number 1 in the serp’s for its primary keyword. Two others are on page 1 and three more are hovering between page 1 and 2 in the serp’s. Not a single scraped article ranks lower than the original source – not one. In fact most of the original articles aren’t even found on the serp’s.

What is more interesting is that I have been able to scrape content from some higher PR sites and still outrank the original source with a low PR blog simply by accumulating several decent PR keyword optimized backlinks.

This is not a call for you to start scraping content. I simply did this to find out once and for all if Google really can detect and more importantly weed out scrapers from the index. The fact is that they can’t. When they encounter duplicate content they simply resort to who has the greater authority and the most keyword targeted backlinks. The only time they seem to get it right is when the scraper leaves in links that point back to the original source.

So what recourse do you have if a scraper is stealing your content and outranks you? Technically Google allows you to report these sites in webmaster tools. Go ahead but don’t expect much. I have reported several sites that regularly steal my MMO content. Has Google de-listed any of them? Nope. The good news is that none of these scrapers outrank me but they are still listed in the serp’s.

Note: RT over at Untwisted Vortex has a good article discussing some of the things you can do with scrapers. See… Defeating Bad Scrapers the Free and Easy Way

The end result is that Google has been putting on a front with regards to duplicate content and the fact is that they do not have a means to weed it out with an automated system. They can only do it manually and they simply don’t have the manpower to do so. An automated system would not be feasible as they would lose far too many top earners like the news organizations who scrape content as a rule. Google doesn’t have an answer aside from maintaining an exaggerated account of what they are capable of. Make people think they will be penalized and hopefully this will stop them from scraping content. Have you noticed that scrapers are becoming ever more prevalent? Seems more and more people are calling Google’s bluff.

  1. Don Mak says:

    Griz,

    Is it possible that Google is also determining via WHOIS that you are the owner of all the sites in question? I seem to recall reading that Google does integrate WHOIS info, as well as whether or not the other websites in question are sharing the same IP address, etc.

    Just a thought and I’d be interested in hearing your replies.

    Thanks,

    Don Mak

    • Grizzly says:

      Don,

      I can’t say if they use whois or not but many people including the smarter scrapers tend to use private registration and this means that Google would not have the kind of info you refer too. My point is that I don’t believe Google is doing anything about this issue – I don’t think they really can and simply lead people to believe they are in order to try and contain it.

      • penny stock says:

        I use duplicate content all the time but it is in the form of news releases in the financial/stock sector. However I do change the title around a bit. I sometimes outrank some of the big news sites for various news releases…everyone will have the same news release but mine ranks higher on some, not all but some…go figure. Even picked up a couple of PR6 links from google trends posting duplicate news releases just like the large news sites do.

        However, what I’m doing is re-posting news released by individual companies that would be more than happy for me to use it for my blog. More “eyes” on their news which is the purpose of a PR/news release in the first place. But, scrapping someones content, not giving credit(via a link) and outranking the orginal source… in a word it is “chicken shit”

  2. I actually played around with this before I quit Vic’s competition. I was following the blog posts of the other bloggers. When two or three would post about the same thing close together, I’d directly copy the content from all three of their posts. I put two in my post. Then I made two or three big comments of unique content, pretending to be visitors rambling. Then I dropped the content of the 3rd post into a comment. I’d instantly BMD the post 3 times on scuttles, then I went and grabbed a couple dofollow blog comments. Then I went to other posts on the competition blog and left comments on my other posts but anchored a link to the scrapped post (since our competition blogs were dofollow). I did this to build internal authority for the KW. Within 1 to 2 hours I’d out rank all three blogs I copied.

    I learned a lot from that little experiment. Google can determine duplicate content, but it isn’t that great at picking the original. The algo is dependent on these “signals”. Which ever site has enough signals to prove its worth, it becomes the one with the most authority. Google is more concerned about ranking the one with the most authority, than they are ranking the original.

    In my example, I was on the same domain as the other three blogs. I had ALL the content, which made it appear their pages had snippets of “my” content. Google most likely assumed mine was original and they were some form of on site duplication. Second, I accumulated the most links first and did it quickly after the post. Even though they’d post their articles 3 to 6 hours before me and already be indexed, I could get indexed and out rank them in 2 hours. So the time Google finds an article, only seems to be a minor factor. I think the factor that is more important is which page gets the best links first.

    About reporting. Yeah, Google is horrible following up on that. If you seriously want to report, I suggest that people always log in to webmaster tools. They take the verified at little more serious than the anonymous ones.
    If a scrapper out ranks you, screw Google, and use SEO =P Go publish 2 to 3 articles and mass distribute them. Then go grab some dofollow comments (prob don’t even need to be anchored). Then, rework your article to improve your on site SEO. That should outrank most scrapper sites, unless that have massive authority.

    (that was a long comment)

    • Grizzly says:

      Justin,

      I have found the same results and agree – the fact is that scrapers with SEO knowledge can prosper just fine online. Yes Google can determine duplicate content but they simply rank it based on backlinks – a pretty easy system to game as you have pointed out.

    • Tevin says:

      I just want to thank you for that comment justin. That’s really an awesome technique.

  3. And another thing to note, if both copies have enough authority, then both will rank. News sites have duplicate content all the time, but they don’t get sent to supplemental results.

    I think this concept can be applied to improving your links. You can help combat duplicate content problems associated with article marketing by building links to some copies of them. As long as it gets enough links, it shouldn’t suffer too much of a duplicate content slap, and the link will count for more.

  4. Ben Willis says:

    I have had content scrapped in the past, but they rearranged some of the phrases and replaced a few words but left no credit or link back to me. I notified Google, who agreed that it was still copyright infringement and had them removed. I just don’t have time to chase around every article that I have ever written to report copyright infringement separately. I now leave a link back to the original article in the first sentence of every post that I write. Ben

    • Grizzly says:

      Ben,

      It sounds like you and a few others like RT have had more success at having scrapers removed by G than I have. I don’t much care if the scraper links back to me but several simply steal my entire posts with nada a link and while they don’t rank worth a damn it still irks me. I should mention my test only involved snippets of text and not entire posts. Leaving a link in the first paragraph is a smart move as a lot of the scrapers never remove links – a good way to gain a few links even low quality as they are.

  5. Hey Grizz (and Justin),

    Thanks a lot for yet another informative post. I was under the impression that they were much better at determining scrapped material and doing something about it. Somebody doing this part time (myself) can only spend so much time doing actual experimentation. This is quite an eye opener. Duplicate content is something that perhaps we worry a little too much about. I don’t see myself scrapping anytime soon but it always helps to have more information. Appreciate it guys.

    Bruce

    • Grizzly says:

      Bruce,

      G does seem to be able to determine duplicate content just fine – they have a problem determining the source though and because of this they fail to act in most cases.

  6. Mike Debyah says:

    Great discussion going on here! There’s a thread in the BANS forums about using RSS feeds on your sites to have content that is constantly updated automatically. There’s a question of would this give you a “duplicate content penalty”? Any thoughts on this? From your post, it sounds like it’s not something to worry about too much.

    • Grizzly says:

      Mike,

      Many people are now using RSS feeds as a way to keep fresh content on blogs that don’t get much attention. Snippets of duplicate content definitely doesn’t seem to cause G much concern and with feeds the link does lead back to the source although without any juice.

  7. April says:

    Thanks Griz and Justin. I really like it when people actually spend the time to test things out. It seems to me that a lot of people just randomly throw ideas about without having anything concrete to back it up with.

    I haven’t really got anything to add other than I always try to get links to every single page of my sites. I think a lot of people forget about supplemental results these days because google doesn’t show it any more. However it most certainly still exists.

    • Grizzly says:

      April,

      This is one of those tests where the results can make one resort to the old blackhat ways – I was surprised that every single blog has prospered in spite of the duplication. This does tell me why there are so many scraper sites these days and it will probably get a lot worse until G finds an answer.

  8. I use Google Alerts to let me know when someone even has my keywords in their post. Yeah, I get a lot of email, but it’s worth it. I’ve used the webmaster tools and successfully had several sites completely de-indexed. I don’t have a problem with scrapers that pull in only part of my posts, but when they snatch the whole thing then LOOK OUT. :-)

    • Grizzly says:

      RT,

      I use alerts as well but man… do you know how many times a day “Grizzly” is used or a variation? I could hire someone just to go through the mail! Lol. You are right though – it’s the “full” content scrapers that really get me wound up – bastards!

  9. Sonni says:

    Hi Everybody,

    Funny you would be talking about duplicate content. I tried to submit an article today on Digg and couldn’t. It came back at me and told me it was a duplicate. The article I was submitting was my Squidoo lens. It may have been a duplicate, but it was my duplicate. This didn’t make sense to me as I went on about my business and didn’t think about it again until now. What do you make of this?
    Sonni

    • Grizzly says:

      Sonni,

      I am so out of touch with the social networks that I can’t really give an opinion. Wouldn’t a duplicate on Digg mean that someone else has already submitted your article?

  10. Costa says:

    wow, I just wish I had half the knowledge about SEO like you and Justin. I just let scrappers be, because most of them just post an excerpt and link it back to my original post with some silly names. LOL

    I’ve seen your articles been scrapped entirely together with all the links, names and everything in it. Even got a few backlinks that way because some of the articles have my links in it. Dunno if that’s a good thing though.

    • Grizzly says:

      Costa,

      I get “anonymous” a lot when the scrapers link back. Is it too much work for the buggers to at least look up our names before posting? lol.

  11. dcr says:

    Good to know that Google can distinguish between internal duplicate content. I always suspected that was the case, but the Google “experts” always warned it was a bad idea.

    Too bad, though, that they are still apparently unable to correctly identify duplicate content elsewhere on the web. I, and I’m sure many people, can easily spot a scraper site. You’d think that, by now, Google would be able to by now as well.

    • Grizzly says:

      Dcr,

      I haven’t had problems with internal content – G does seem to handle this quite well. As for external sources I believe it just comes down to manpower and lack of will.

  12. Don says:

    Wrote about it today Grizz, they are getting rather ballsy indeed and its one of those things I guess we just accept , cause to doesn’t look like anything is going to change

    • Grizzly says:

      Don,

      I read your post,

      I’m Thinking Mark Ress Wont use this Post

      and had to laugh. I once posted screenshots of the scraper site in a post and the silly bugger still posted the article – an article calling him a thief! (He was kind enough to leave my link in as well)

      So… did your scraper post the article outing himself?

  13. Justin G says:

    What a tasty post. Enjoyed it Griz!

  14. raki says:

    ..just got to say i love the new site, i’ve been reading up on the ‘ugly’ site and just gotta give major props to you Grizz!

    ..i’m just starting out, maybe around 6 weeks into it and learning a lot. Just have a question about the topic.

    ..I post like maybe 200 words per blog post and bookmarking them, not really submitting them to article directories. I am thinking of letting the post pile up first, compile them into a huge article later, then submit them to article directories.

    ..will google see this as a duplicate?

    • Grizzly says:

      Raki,

      You can do as you suggested but why not take a little time and use original just to be sure that the links will give you better juice. The one unknown is how much juice G credits a duplicate article with so why not just do the extra work and avoid any questionable tactics.

      • roger says:

        does it mean, re-writing the article, Grizz.

        • Grizzly says:

          Roger,

          Yup… it means re-editing more than re-writing the whole thing. Change up the sub headers, paragraphs and use synonyms for your main text. With a little practice you can learn to do this fairly quickly and use copyscape to see if you pass.

  15. Griz if you get up early in the morning and feel like chattering about a few things let me know. I will be up.

  16. Mrinal Bose says:

    Nice post, Griz. You’re simply amazing in providing us usweful infrmation and insight on a regular basis. Many thanks.

  17. Personally, I’m more than happy to have a scrapper steal my content. As long as you build enough authority, the scrapper won’t outrank you. I want Google to keep them in the index, because then its a back link for me. Of course Google discounts scrapper links, but they are not 100% at detection and not 100% at discounting. If the scrapper links to me, especially with my anchored text in tact, I’m happy =)

    Its not different than getting BMD imo. Horrible quality links, but massive volume of anchored links… I’ll take that, especially when they can be dismissed at “not my fault”.

    Griz… this blog is going to take off… the massive amount of content you get through comments…lol. And you already have a fair amount of anchored backlinks.

    • Grizzly says:

      Justin,

      So far I have avoided the sandbox which will be a topic for a new post. I’m not 100% sure but I believe I know how to avoid it now fairly consistently. Thanks for the support and I hope this does take off – the more authority sites we can all build between ourselves the more we can help each other.

  18. Terry says:

    Good stuff Griz,

    I know of a person (6¿6) who has been running an experiment by regularly scraping news stories off certain um, local television websites and posting them on his blog, which is nowhere in the SERPs just now, but I’m willing to bet that in due time that blog will gain a fair bit of authority just out of his pure dogged persistence…

    As for my own stuff being scraped, it is irksome and annoying to say the least. So far stuff I have tracked has only turned up on de-indexed or sandboxed blogger blogs from strange faraway countries so I’m not too worried about it.

    • Grizzly says:

      Terry,

      Glad you dropped in. Totally off topic but I added the Comment nesting plugin but it doesn’t seem to be working – any chance you could check and see if I have buggered something up… aren’t you glad you dropped in? Lol.

      I get a lot of PR N/A sites with no content other than my own pop up all the time but the sites are so bad that they wont ever see the light of day from the sandbox. Almost feel sorry for them wasting their time – maybe they should read some of the content they are stealing! Lol.

      • Terry says:

        PS: Watch your askimet, while you’re already getting spammed, I just released a post from John Tighe that had been sitting in there! I also changed you’re setting that treats links in posts as spam back to two, as my initial response got askimeted for the single example link (which I subsequently altered)… D’OH!

    • Terry says:

      No problem Griz – lucky I happened by, its fixed! You didn’t do anything wrong it just needed a tweak – something I’m pretty good at!

      As for altering your post headers to link back to the homepage – don’t know if you want to do that. I’ve got them set to generate SEO post titles using the category/post-title as part of the url. To link back to the homepage on my blogs I generally put a link in my sign-out, ie:

      blah blah blah blah blah blah blah blah,
      blah blah blah blah blah blah blah blah,

      Griz,
      < *a href="full-url">Make Money Online< */a>

      If you just use http:”/” it links back to the homepage ok, but you don’t get that link if someone scrapes your post. Same goes for linking back to the previous post (if you do that) WP has a nasty habit of truncating the link, so you have to go into the HTML panel and manually write the link. Oh the joys (and extra work) of WordPress!

      Terry

      • Grizzly says:

        Thanks for the tweak Terry – I’d be interested to know what you did but only if you can explain it in English in less than 5 words! lol.

        As for the homepage link I think you are right – I will manually add them and leave the current default as is.

        Thanks as always Terry.

        • Terry says:

          Now that would be giving trade secrets away…

          Oh, ok then LMAO… I just went to the plugin creator’s site and read through the install notes and yep, there is a minor code change needed for some WP templates – it took me about a minute to fix it!

          I wish everything was as simple as that!

          I’m shooting you an email about “The first of many”…

  19. DennisJr says:

    I know I have said this on the other site. However, I think it’s worth repeating what Grizz suggested to me. Hide a link in period (.).. I do this if I think the article is a decent article and might be worth stealing. I have caught two people using Grizz’s suggestion. I did this, because one of my blogs they stripped out all my links and reposted.Of course, their blog had higher PR than my blog did.

    • roger says:

      how to do this. please more details.

      • roger says:

        also, when I submit article to directories, they won’t allow to put anchored link back to my site in the body. I am using resource box. someone said here, they put the link in the first line of the article.
        is this ok or how to do that. thanks.

      • Grizzly says:

        Roger,

        It’s done the same as any link but instead of using a keyword in your anchor text just insert a period in place of the keyword.

        Some directories like GoArticles allow you to add links in the post and some directories don’t. If the directory allows links then always put a link in the first few lines of the first paragraph.

  20. Don says:

    Grizzly, LMAO my post showed up today on his Hub Page, Kinda funny

    Don

  21. AB says:

    I have a website that I have created 60+ posts out of nothing but snippets from news articles. There has never been any other content whatsoever over two years. It is designed to promote 1 affiliate product. The site makes between $1000 and $1200 per month since late 2007 and has a PageRank of 2.

    I do link back to the original with dofollows, but still it is amazing to me that G hasn’t called me on it.

    • Grizzly says:

      AB,

      Funny you should mention “news” articles. These are the most commonly scrapped snippets and G totally overlooks them. Probably because all the major news networks use the exact same “AP” or “Knight Ridder” feeds and publish the exact same articles far and wide. None are penalized for it.

  22. Congrats on the new WP blog. Looks nice.

  23. Ian says:

    Hey Griz,

    I noticed you’re setting up this blog in a similar fashion to you Blogger blog.

    Didn’t know if you wanted to set your post titles to link to the home page like you do with your Blogger set-up.

    If so you need to replace

    <a href=”">

    With

    <a href=”/”>

    Terry will know how to sort it…

    Nice blog pal

    • Grizzly says:

      Ian,

      Thanks for the tip. I think we have figured out a plan. Appreciate your help. :-)

      • I always wondered why you linked your headings back to your homepage on your blogger blog. What is your reason for this? Just to build internal authority for the home page? or what?

        • Grizzly says:

          Justin,

          The short answer is quite simple. Without going into the whole inter – linking strategy and getting duplicate entries in the serp’s my main reason for this is to send PR for all my keywords back to the homepage.

          While a fresh post has no PR it will in 3 months time (usually) as I build links to every page. Eventually (hopefully) I will have dozens of PR 3 and 4 pages, each ranking for different keyword authority, pointing back at my home page which basically increases its keyword authority for countless terms. This is one of the reasons my home page doesn’t just rank well for a single keyword but for dozens of them.

  24. Ian says:

    Oh bugger :) I’ll email you or find Terry (Tery should know how to do it anyway)

  25. Denise says:

    I guess I am a little dense here! So is it OK to use a snippet of duplicate content as long as you link back to the original source?

    If I could make the kind of money AB makes from only one site I would be thrilled!

    If Google does not care and I give credit to the author, would it not be a good thing?

    • @Denise:

      I’m butting in because I like to butt in. :mrgreen:

      If the snippet is really short, like 150 characters or less, you don’t even need to link back to the source. Anything more and it’s a good idea, if only to keep the source owner from flipping a wig. I’m talking about one or two paragraphs only.

      • Denise says:

        RT, Thanks for clearing this up. I am definitely going to use this more. It will give me a lot more time to get things done.

        With my computer broke, I have to get as much into a short period of time as possible using all my kids computers.

        Got to figure out how to replace a mother board in my 4 year old XP computer. I knew I should have gone into computer science when I was in college. Maybe it is time to go back to school. Even at almost 60 I need to keep up with this computer stuff. :-(

  26. Sonni says:

    I don’t know what I’m doing is the problem. In one of the work at home forums they said to submit articles to hither and yon so I went about trying to do that. Obviously I messed that up. Silly newbie that I am. I don’t know much about those social communities or understand what they do. I might just need to stay out of there. Thanks for all the great info you provide. I’m trying to learn all I can.
    Sonni

    • Grizzly says:

      Sonni,

      The basic gist is to simply get links (using your keyword if possible) pointing back to your blog from as many sources on the net as you can. Article directories and the social networks are some of the more accessible places available for leaving such links. It’s a slow process that unfortunately needs to be done in order to achieve both PR and serp rankings.

  27. Rhys says:

    Hi Grizzly!

    I like the concept of hiding a link in the lead paragraph, at least that way there is an odds on chance of acquiring a back link. It’s an interesting problem though, how to make any keyword gain using a period as anchor text?

    It’s a nice compliment when a person thinks enough of our work to copy and display it as their own, so if we can’t prevent it, perhaps we ought to lean back and enjoy it?

    I’m reminded of the interesting proposition that; “When we steal one persons work it is called plagiarism, but when we steal from many writers it becomes research”!

    • Rhys, you have to be careful with “.” thing. Google will penalize for it. I didn’t know that until Matt Cutts specifically wrote about it.

      • Grizzly says:

        RT,

        You are correct but I still do it sparingly and haven’t had a problem but caution is the keyword. G has a problem with people using it for re-directs and link spamming but I figure since I do neither and am simply linking my own article back to my own page I shouldn’t have a problem. I haven’t but this could change knowing G. It’s good that you mentioned this though – I don’t want folks to start dropping tons of hidden links on their article directory posts etc…

        (The links have no juice value anyway – unless you want to rank number 1 for a period!) :-)

  28. Fiar says:

    I remember Frank leaving a comment about rewriting the content just enough that it’s no longer duplicate content. In that case it was someone that just didn’t know enough not to copy and paste, and there was a link back, but it’s still there as an option if you’re getting scraped.

    I was already under the impression that the higher PR gets counted as the source.

    Yay! I can subscribe to comments.

    • Grizzly says:

      Fiar,

      Higher PR and/or the most anchored backlinks tends to win. You can thank Terry for the comment subscription – whatever I did didn’t work… Terry to the rescue…

    • Frank C says:

      Doing a full article rewrite enough to pass Copyscape is generally a good idea, probably more to avoid pissing someone off and having them gunning for you than dealing with Google’s algorithms.

      Google also seems to be good at going after sites that use full Wikipedia articles, probably because they are/were popular with YACG users. I’ve heard the same about content taken directly from popular article sites as well. I wonder if they have a database of popular scraping targets.

      As for copied content, I’ve occasionally had problems where someone copied one of my product reviews and pasted into a high PR forum. I get sent to the dupe filter or reduced in SERPs when that’s happened so I do a quick rewrite and throw in some comments and I’m back where I was.

      • Grizzly says:

        Frank,

        That’s a great tip if you find a scraper is outranking you. I have also found that a dozen quick links to my original takes care of the rankings on the odd occasion that I do get outranked – although not everybody has this option.

  29. goldcoaster says:

    I regards to part or full feed scraping. On one of my bloger blogs someone is scraping the feed. Part feed on blogger has no links included but the full feed has links so I changed it to full feed to get the links back to my site – I thought its a good idea but seems I am wrong? What do you think?
    I used the webmaster link to report it at first but Gs email back said I need to have copies of every page affected and pretty much have a solicitor to certify it all and send it to Goog.. couldn’t be buggered to do all that.

    • Grizzly says:

      GoldCoaster,

      I use the the full feed as well – I always take the link if I can get it so I wouldn’t say you are wrong.

    • Frank C says:

      Some people do partial scrapes directly from Google Blog Search or other sources like that. This is just a price to pay for syndicating your content. If you want truly private content then you’ll need to cut Google and other syndication out of the deal by either doing a private membership site or doing an email list.

      Now, if you use pictures in your posts and the scrapper hotlinks back to your pictures you can have some fun with the scrapper by swapping out your original picture with a big honkin’ 3000×3000 picture. :)

  30. [...] about making money online in a less SEO but more user friendly format. Check out his article on Google’s weak spot when it comes to duplicate content. Knowing how Google works is a key skill to learn, since they [...]

  31. Greg says:

    Griz (or anyone else who is doing a bit of scraping),

    What tools are you using to facilitate this? RSS2blog or are you doing it by hand and not automated?

    Thanks,

    Greg

    • Grizzly says:

      Greg,

      I created the posts by hand. I should mention that I added a number of SEO tags to the posts in order to target the keywords I was after. (Different title and sub headers etc) I have no plans on continuing this practice though – I just wanted to see what the results would be. If you are planning on doing this then be careful…

  32. lissie says:

    So what I am hearing is that I should setup several free blogs and scrap snippets of my own content for the backlinks? Is it important to have only 1 broad topic per a “scrapped” blog?

    • Grizzly says:

      Lissie,

      Let me make this clear – I am not promoting this idea and have simply laid out some info that I have come to realize. What you do with this is your own decision but please don’t come back to me if you find yourself in trouble.

      How’s that for covering my ass?

      As for building backlinks – yes you can do what you suggested however if you are only using Blogger for example, then no more than 4-5 backlinks as they all come from the same C class IP address. To do this effectively you will need various blog platforms (different IP’s). Btw… I didn’t just tell you to do this – I just told you what I would do if I was going to do this – which of course I am not! :-)

  33. [...] Online blog is one of the few that provides me with intellectual stimulation, just check out his duplicate content post. The comments have lots of great discussion with real people talking and sharing knowledge. I [...]

  34. [...] I've said my 1,000+ word spiel about bad scrapers inspired by Grizzly's article, "Duplicate Content is Google's Weak Link", tell me what you do with them and how you do it. I'm all ears (or eyes). Articles [...]

  35. Michal says:

    Grizz,

    I am really looking forward to your post about sandbox because frankly, I think that I hit one big time 4 days ago. My weight loss niche blog got on average about 500 uv from google and next day about 5…it is very very bad. I dont know what to do.

  36. Bryan Clark says:

    Pardon me if someone has asked this already, but there are 70+ comments, and I’ve gotta get going. Just a quick question for you as far as these scraper sites go, you obviously know quite a bit about it.

    I read not to long ago that there was a plugin that inserted your link into the bottom of your post and was only viewable via Rss. This was apparently supposed to provide a link to you when your content was scraped. Is this effective? Or was it ever? I think I remember reading somewhere that Google reads the link as attribution from the original source (or something along those lines)… but I could be wrong.

    What is your take on this?

    • Grizzly says:

      Hi Bryan,

      I’m not familiar with the plugin other than I have heard about it. The problem is that links on the bottom of posts aren’t much good unless the scraper takes the full post. Most only use partial posts and the most effective way to deal with this is to manually insert your link in the first few lines of the first paragraph. Some will remove the links but the vast majority never do. This method is better as it doesn’t matter if the scraper uses the whole post or just a snippet – either way you get a link and yes, G does use that to determine the original source or at least appears too.

  37. roger says:

    Grizz,
    1).If I have 100 sites and put links pointing to my own but 10 different sites, all from same IP address, am I inviting trouble? Is it legit or how to do that to get backlinks?

    2). when i submit one article to 200 directories as all of us do, is n’t that duplicate content(all 200 directories has same info).
    thanks.

    • Grizzly says:

      Roger,

      If you have too many links coming from the same IP Google will likely just ignore all the links. Normally you wont be penalized unless they feel you have created a link farm. It’s best to not have more than 4-5 links coming from any 1 IP.

      Articles on article directories will always produce duplicate content – as soon as someone uses your article it is duplicated. I don’t submit the same article to every directory but many do. The fact is we don’t know how much juice G credits duplicate links so I assume very little but could be wrong. The link I want most is from the directory site itself and not the links from users who download the posts. If I submit the same article to all the directories then G may well only give me 1 good link – the link from the highest ranking directory site while it ignores all the others as duplicates. If I use a different article for each directory then I will get 1 good link from each directory. This is more work but it does produce good results.

      • roger says:

        Grizz,
        1. if I do a blogroll exchange, then I get depending on the site from where I am getting links, I get say around 100 links. But you are saying NOT more than 3-4 links from one IP address.

        2. another variation of first one: say my site and a different site with whom I am doing blogroll exchange, is hosted by same hosting company, does it mean since the IP address is same, those links doesn’t matter.

        3. I submit articles using article submitter to 30 directories. But you mentioned that you will send a different article to each directory, what service do you use or how you do this?

        thanks.

    • I approach link building in two ways, which is by separating my view of links into quality and then quantity links. People get hung up on issues like duplicate content when article marketing, but the fact is that almost all article links are really really low quality anyways. I bank on the fact that Google isn’t perfect at detection or discounting. If I push out a massive amount of articles and generate 1000 links and Google is only 95% (just guessing for example) accurate at catching and discounting links… that’s 50 low quality links all anchored. I look at article links the same way I do BMD links.

      Also, I think people get too caught up on the issue. I work two types of sites, flag ships and very low competition long tail niches.

      Niche sites require so little authority to dominate, there is no reason to worry. Just do your basics, optimize on site, and in a month or two you’ll hit page 1.

      For example, I’m working an adsense niche this morning with an average cpc of $20. The #1 guy has 7 links!! Throw up an SEO theme, push out 2-3 articles, submit to a few directories, do some BMD, and grab a couple of comments. Don’t stress it, in 1 to 3 months it will rank.

      Now for flagships that attack serious keywords, you’re not going to dominate serps with articles. Guys who dominate keywords like Make Money Online have some serious linking power. The duplicate content factor with articles is far from being a factor that will decide those ranking.

      I have an engineering background, so I always try to think of Google in a mathematical / science context. One thing about statistics, large scale data analysis and sampling is that there is always an error. With duplicate content, that error is your friend =)

      • David W. says:

        Justin,

        How small of a niche would you say you can get away with using that little effort on? I’ve been trying to rank for a couple small terms, and have been going all out trying to get links, like one article per day, comments, all that. It sounds like maybe I should tone it down a little and just wait and build new sites. How do you gage the appropriate effort for a given SERP position?

        • Fiar says:

          How Long have you been at it? There are two main possibilities here. One is the competition is too authoritative, and the other is that just not enough time has passed.

          It does take time, even for low competition terms to get the ranking. If you’re competing with PR5 sites and above, you will have trouble ranking as well.

          Of course, there’s also the sandbox.

          I would build a few more sites. I build mini-niches on a time frame of check in next month, and see how things are going. Then check in a month later. If you think of setting up the site, and then go to a monthly maintenance schedule, you’ll drive yourself a lot less crazy.

  38. Chanya says:

    Griz:

    Yesterday after reading the comment thread I used Copyscape to check some of my sites for duplicate content. Sure enough someone scraped an article (a very popular article I might add) from one of my new niche sites. Because the site is new it’s a PR0. However, the site containing the scraped content (a forum) is a PR5. I’ve twice contacted the webmaster about this issue asking them to remove the post from their forum but thus far they’ve neither removed the content nor responded to me.

    Unfortunately, this isn’t something I can rewrite because it’s a table that lists certain laws for each state. It took hours to research this information (I had to go to the Dept. of Health for each state) but I knew it would be popular because no other site had the info. in one place.

    Based on what you’ve written, G will probably look at both sites, give the 9-year old PR5 site the nod since they’ve got authority, and penalize my 3-month old site. That’s a bummer too because regardless of how many links I get to that article (the article has garnered a few comments as well) I can’t compete with a PR5 site!

    • Jez says:

      @Chanya… yup it is highly likely G will give kudos to the site that stole your content.

      G can no longer (if it ever could) tell the difference between a spam / auto generated site and a legit one, so it assignes a lot of weight to age and “trust”.

      Knowing this spammers / scrapers / SEO’s spend a lot of time and money acquiring sites with age / trust…

      You could try contacting G via webmaster central, but dont hold your breath…

    • Jez says:

      Oh yeah… dupe content is NOT A PENALTY… it is filter, so, your site wont get penalised, it is just that one page will not rank as well as it should.

      To get it back, you could add more relevant content around it, put more internal links into it, and get more external links… just depends how much time / effort you want to put into it.

  39. Jez says:

    Surprised no one has mentioned GGI hijacking here… clone a site with CGI scripts, point links at it and get the original site de-indexed / sent to supplemental.

    Dupe is a tough nut for google to crack though as a lot of legit content is dupe or near dupe… google any major news event ant there will be near dupe stuff by different news agencies covering the story.

    • Hi Griz. I’ve been busy so I haven’t been keeping up with this discussion. Then I found out this morning that someone used GCI to hijack my health blog. So I wanted to see if anything in this discussion would apply. I read Jez’s comment re CGI hijacking, so I wonder if he might be able to help me.

      Jez, you don’t have a link so I can’t visit you. But if you or anyone else who is knowledgeable about CGI is reading this, would you please direct me to any info on fixes for hijacking?

      Fortunately, the butthead hijacker has his name and server info in the WhoIs, so I was able to contact the host to get his account suspended. At least I hope that’s what they’ll do. I haven’t heard back yet. But I’d still like to know how to fix the hole in WordPress that allowed this to happen.

      I’m so hopping mad at the jerk who did this right now; I can barely type!

  40. Daniel says:

    Grizz!

    You said this:
    As for building backlinks – yes you can do what you suggested however if you are only using Blogger for example, then no more than 4-5 backlinks as they all come from the same C class IP address.

    And this:
    If you have too many links coming from the same IP Google will likely just ignore all the links. Normally you wont be penalized unless they feel you have created a link farm. It’s best to not have more than 4-5 links coming from any 1 IP.

    This is pretty huge. Crucial to link building I think. Did you learn this from testing, hunches, general experience, other bloggers? On one hand, it makes some sense that G would discount too many links from the same IP or IP block, but on the other hand shared hosting would have loads of sites on one IP. G could be discounting viable legit links. Same thing with Blogger. A popular site could get 100s of Blogger links, only 5 would carry any weight!?! That would effectively weight the less popular blogs equal to the more popular ones. Jeez I wish I had the time to run controlled experiments to figure all this shit out.

  41. sonni says:

    Wow, it’s like a forum in here. I don’t understand much of what is being said since I’m a cluesless newbie, but I’m trying to figure it out. It might take longer than 2 months give or take a few, lol.

  42. Jez says:

    @daniel / griz

    Google does interrigate whois, there is some debate over whether they can access hidden details as they are a registrar. It is possible to register domains by proxy, i.e. have a third party register them for you so their details show instead of yours.

    Regards IP’s… last I read G were toning down on this due to the issues Daniel mentions, i.e. large shared hosts, platforms like wordpress / blogger.

    A lot of people hiring link builders stipulate different class C IP’s, however, but that probably has as much to do with trying to stop the person they are paying simply dropping all the links onto their own sites as it does the G thing.

    Regards the idea that

    “A popular site could get 100s of Blogger links, only 5 would carry any weight”

    … simply not true, spammers try to build hundreds… thousands of blogger / myspace, live journal etc. sites… they still pass value, though, if you had 10k blogger links and nothing else I am sure you would hit a filter!

    Readers of this site can safely disregard the IP issue (IMO).

  43. is it any real surprise that we are finding out google bullshits us? they have little to no useful documentation. what documentation they DO have comes from great public turmoil and outcry at using their applications.

    much of google’s business is akin to saying ‘pick a card’, then holding the card behind their back while saying ‘no your wrong’ without ever taking a look to see if you are.

  44. sonni says:

    Hi Griz,
    I have a wordpress blog and can’t manage it so I just got my first blogger blog. I managed to find your tutorial on how to add pages to the blog. I did exactly as you said however, after I changed the maxwidget=1 to 3, there was no showaddelement no/yes next to it, so I added it. Now I have ‘add a element’ twice in the little box. Did I do the right thing by adding showaddelement yes just as you showed it in the image? Or should I take it out?

    One more question: Do you know how to get rid of the date?

    Thanks for the tutorial on how to add pages. I hope it works.The only bad part was it didn’t look like yours did for some reason.(the showaddelement part)And thanks for your help on everything.
    Sonni

  45. DennisJr says:

    Hi Soni,
    You may want to check out Grizz’s Blogger basics. The site is listed below.

    http://beginner-blogger-basics.blogspot.com/2008/07/how-to-remove-date-and-author-on.html

  46. sonni says:

    Hi Dennis,
    You’re a life saver, since I’m a newbie there are many things I don’t know. I’m still trying to figure out how to add a page, actually I don’t know what I did as you can see from my message. Anyway, thanks to you and Griz I now am rid of that darn date. Thanks again.
    Sonni

  47. Hey Griz, nice to see you come over to the Dark…err…WordPress side!

    I couldn’t agree with you more. There was a time when I bought the whold dup content theory. I then started throwing some PLR Articles on a few blogs that I didn’t have time to write content for.

    Imagine my amazement when those post started ranking!

    I don’t use scrape content on many blogs, but like you noted, if you have some PR Google will overlook the content.

    I might add that I do change the Titles and make sure there are no links (other than the ones I want in them) in the articles.

    Don

  48. [...] has quite a few authority blogs and has been testing out ways to use authority blogs to show that Duplicate Content Is Googles Weakest Link [...]

  49. camping tent says:

    Interesting study Griz. A lot of my original content is scraped and it pisses me off when I’m not credited with the work. I never thought Google could do much about it and am now even more certain after reading your study. Though it does embolden me to continue using scraped content for building out my blogs. I use Orwell Pro quite a bit and have been using Frank’s Link Luv Builder as well (but I leave credit where credit is due). Honestly it never seemed to matter if my stuff was original or not. It didn’t affect whether I made money or not. It was always whether I paid attention to the site. Particularly if it was a blog. Seems if I post consistently and get backlinks it doesn’t matter what OR how much I post. I’m having just as good success writing a 100 word post with a photo as I do with the 4-500 word post.

    splork

  50. [...] at make money online with Griz, Griz has a post where he discusses Google’s handling of duplicate content. You should have a [...]

line
Powered by Wordpress | Designed by Elegant Themes