Stop Site Scrapers with AntiLeech

It’s always a constant challenge how to protect our original content from site scrapers. It’s even harder no that blogs are easier to scrape because of full RSS feeds.

There’s the usual prevention methods most bloggers do — going partial feeds instead of full feeds. I’ve never really got worried with it even though I’m publishing full feeds. However lately, I’ve noticed that the scraper sites (splogs) sometimes even ranked higher than mine which has caused alarm.

Search engines promises publishers their system can intelligently identify the original from the dupes but I don’t think their success success rate is any good either. So, I thought getting a back link from the splogs will solve that dupe issue.

Lately, I’m using the Feed Footer plugin which adds custom footers (copyright, notices, advertisements) to the bottom of blog posts in the RSS feed. I’m sure most of you have seen them already.

However, if that’s not enough, you can try the AntiLeech plugin:

AntiLeech produces a fake set of content especially for them that includes links back to your site and sends it only to them. When they steal this content, it appears online just like normal, except now you’ve turned the tables on them and have provided them with useless content.

AntiLeech can detect a splogger bot using its User-Agent string (an identifier that some bots send when they are collecting data), or by IP address. You can enter a User-Agent or an IP address into the Options panel of your WordPress blog. When a visitor with a qualifying (any checked option on the options page) User-Agent or IP address visits your site, they will see only the generated content. They will see it in your page layout and in your feeds. Anywhere you’re normally outputting content, that’s where the fake content will appear to them.

Regular users whose browsers do not match these strings will see your normal content. RSS aggregators should be able to display your content normally, too.

You can download the plugin here. AntiLeech does not really prevent the splogger bots or the splogger themselves from accessing your site, they can still manually do a copy and paste. Still, you have one less to worry about.

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 1,002 other subscribers
Avatar for Abe Olandres

Abe is the founder and Editor-in-Chief of YugaTech with over 20 years of experience in the technology industry. He is one of the pioneers of blogging in the country and considered by many as the Father of Tech Blogging in the Philippines. He is also a technology consultant, a tech columnist with several national publications, resource speaker and mentor/advisor to several start-up companies.

14 Responses

  1. Avatar for karla karla says:

    Thanks for the tip!

    Ang daming mga nagiiscrape ng blogs ngayon. Argh! And my rockersworld.com blog is of course, one of those blogs being scraped.

  2. Avatar for JC John SESE Cuneta JC John SESE Cuneta says:

    They’re doing it for the money. Or for testing their scripts. Other than those two reasons, I don’t see other reasons to be plausible for their kind of actions. ;) At least for me.

  3. Avatar for jhay jhay says:

    Nice tip. I’ll give the plugin a try, my blog has been a victim of splogs since last year. This would help turn the tide against them.

  4. Avatar for JC John SESE Cuneta JC John SESE Cuneta says:

    @ChrisMo: It’s a good solution, however, for sites and/or blogs whose content are being re-published/syndicated legally by other sites, or members of online newspapers, they heavily rely on Feeds with full post content.

    They have no option but to provide it, and secondly, there are feed subscribers who prefers to read the whole content than to visit the site just to read the rest of the post.

    It is a war that the Feed/Syndication Community will soon have to face in full force. However, base on my experience and other people’s, RSS-based feeds are mostly the victims, while Atom-based have less victims. To begin with, Atom is a WebStandard, RSS with its endless flavors, is/are not.

  5. Avatar for ChrisMo ChrisMo says:

    My solution would only be Rss feed with SE friendly url’s, I mean urls only… So that there isn’t any real content to scrape, rather a link to an article to the site post. You need to make better posts titles though…

  6. Avatar for Blogoloco Blogoloco says:

    i’ve seen some of my articles from another website actually.

    it’s alexa rank is far highr than mine but i dont understand why they have to do that.

  7. Avatar for ms.jane ms.jane says:

    nice post master yuga.

    For BrianB. whats the connection of you link? tsk tsk self promoting. Dumadalas na yata style mo na ganyan.

  8. Avatar for BrianB BrianB says:

    How fragile is the intarwebs…

    http://www.ohgizmo.com/2008/03/03/the-internet-its-more-tangled-than-you-think/

  9. Avatar for JC John SESE Cuneta JC John SESE Cuneta says:

    yep. If you are researching, you’ll end up getting sites that doesn’t have the content you are looking for because they just stole it via feeds. I encountered around 12 already that specifically targets Pinoy owned blogs.

    Fun to watch, but not fun anymore if you are one of the victims. :p

  10. Avatar for otoyreyes otoyreyes says:

    nice feed abe :)

  11. Avatar for Abe Olandres Abe Olandres says:

    @calvin, those are sploggers feeding off the rss.

  12. Avatar for calvin calvin says:

    yung pinoytravelblog parang may ibang sites na kumukuha ng sa content nya. travelhostel ata or something. same na same pati categories. hahaha, parang duplicate ng site pero ibang theme ginamit. sinadya mo ba yun abe?

  13. Avatar for Jomark Osabel Jomark Osabel says:

    I will give this plugin a try. Thanks Yuga.

Leave a Reply
JOIN OUR TELEGRAM DISCUSSION

Your email address will not be published. Required fields are marked *