Content theft(scraping) is one of the biggest headaches for bloggers. It doesn’t feel good when you see your article on another blog. Most of the time, the blogs that scrape content are Made For AdSense with no original content.
Last week, we conducted a small poll about content scraping. The question was “Has Your Content Ever Been Stolen?” Here are the results:
How to Find Scrapers
FIrst step to fight scrapers is to find them. Here are the ways to find scraped content:
- Internal Linking: Internal linking is one of the best strategy to find scrapers. Most of the scrapers copy the feed exactly and in this process, links to your posts are retained. If you use WordPress, then you will see linkback on your dashboard or you can use link:yourdomain to search Google for backlinks.
- RSS Footer Links: As most of scraping is done through RSS feeds, it’s a good idea to add a copyright notice to your feeds. WordPress users can use RSS Footer plugin for this.
- Copyscape: Copyscape lets you search the web for duplicate content and offers a warning banner that you can add to your blog.
- CopyGator: Better service than Copyscape that lets you find duplicate content around the blogsphere based on page or feeds. You can get email alerts, enter feed/blog URL to find duplicates and ping them to find duplicates of latest content.
Scrapers are getting intelligent day by day and use new techniques to make sure you do not find the duplicate content. I have noticed that many splogs use scripts that convert links to your blog to their own links. One such scraper I noticed was copying feeds from Daily Blog Tips and was replacing dailyblogtips.com part of domain with own URL. This prevented him from appearing in Google for backlinks but CopyGator still found him.
How to Fight Scrapers
Once you find a scraper, next thing is to take action. Here are the steps that you can take against scrapers:
- Secure Evidence: Search around to see if Google has cached the page. Cached pages are snapshots taken by the search robots as they crawl the blog. Google uses these as backup, but you can select that version and copy it. You can also use WebCite to make a copy of cached pages.
- Legal Actions: You can start by contacting the blog owner to take the duplicate content down.
- Report to AdSense: Most of the splogs have AdSense ads running on them and you can complain easily. Just click the Ads by Google logo. In the new tab that appears, click “Send Google your thoughts on the site or the ads you just saw” and report the violation.
- Report to their Host: Go to whoishostingthis.com to find the host of splog and report in detail.
- Report to Domain Company: Many times, such blogs use free domain services like co.cc and in that case you can notify them about it.
How do you control scraping? Do tell us through comments.