Has your blog content every been stolen? Have you ever used Google search and been incensed to discover the stolen version ie. duplicate content is appearing in higher positioning in SERPS (Search engine page results) than your original article appears?
Duplicate contentis content that can be accessed on more than one URL. “Duplicate contentgenerally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. If search engine spiders can’t tell which version of a web page or document is the original or canonical version, then the consequences will be less search visibility.
Duplicate content within a domain
Duplicate content within a domain is a common problem on blogs where multiple URLs can refer to the same content, for example, if you have full posts displaying in Archives, Categories pages and Tag pages. On self-hosted wordpress.org installs the no-index, follow tag can be used to instruct Google and the other search engines to crawl the page and follow the links but not add the page to its index. This cannot be done on free hosted blogs on wordpress.com, as it’s a multi-user blogging platform whereusers cannot access and edit themes or templates. WithPanda rolling out globallyand Googlegiving adviceto remove duplicate content and non-original content, what is one to do?
To reduce duplicate content within my domain I have taken these steps:
- Have set blog RSS Feeds > Settings > Reading to “Summary” rather than “Full” to reduce content theft.
- Have a Copyright page and copyright notices also to reduce content theft.
- Do not usea theme that displays full posts in Archive pages, Categories pages and Tags pages. Instead I use the Inuit Types theme as it isa theme that automatically provides excerpts of post content on the Front page, Archives, Categories and Tags pages.
- Copy and paste a sentence from my latest post into Google search a few hours after publication to search for duplicates.
- UeCopyspaceto search for duplicates.
- Useplagium (beta) to track plagiarism.
- Set upGoogle Alertsfor my domain names.
- Act immediately when I discover my content has been stolen and file a DMCA take down notice when required.
Duplicate content across domains
Though it isn’t the only cause, the most obvious cause of duplicate content is when people intentionally lift content from other sites for their own use. Many content thieves are usingBlogspotfree hosting andAdsense(Google owns both) to make money from stolen blog content. In March Google decided to change the search algorithm by means of the “Panda update.” It was aimed at rooting out duplicate content from content farms thereby delivering relevant results and enriching users search experience. The bad news isGoogle’s new “Panda” algorithm is ranking some stolen content higher than the original versions.
Kunal Pradhan, Ahmedabad, India posed this question to Matt Cutts of Google:
“Google crawls site A every hour and site B once in a day. Site B writes an article, site A copies it changing time stamp. Site A gets crawled first by Googlebot. Whose content is original in Google’s eyes and rank highly? If it’s A, then how does that do justice to site B?”