Is Duplicate Content a threat to your SEO strategies?
Even before we start writing any content, we make sure in our head that we don’t write anything which is similar to any other site or content. We try to be as unique as we can while writing the content.
Why do we take so much pains? Why can’t we simply cut, copy, paste and write the content. It will be so simple.
Is it correct? Let’s find this out.
What is Duplicate content?
Duplicate content in the online space means having the same content on two or more URLs. This could happen due to many reasons. When there is an identical content on many URLs it becomes difficult for Search Engines to decide which one is original. Search Engines gets confused as to which content to index and show on SERPs. Matt Cutts, Head of Spam at Google said that he personally would not stress about duplicate content unless it was spammy.
Let’s us study Different types of Duplicates:
- True Duplicates – When on any page, the content is exactly same to another page, but has a different URL, it is called a true duplicate.
- Near Duplicates – It happens when a part of text, an image or the order of the content is similar to the other web pages.
- Cross- domain Duplicates – This is when two websites have the same content. These duplicates could either be true or near.
Examples of Duplicate Content:
- www vs Non- www URL- This happens when you attract links or social media mentions to the wrong URL and then both end up getting indexed.
- Staging servers- This is caused due to subdomains. For example, if you are working on a new site design and you have set up a different subdomain for the website and you forgot to block it with robots.txt. This causes the crawlers to index both the URLs.
- Trailing Slashes (“/”)- For example- www.example.com/blogs, and www.example.com/blogs/ these are recognised as two different URLs. These days Google automatically canonicalizes these URLs in most of the cases.
- Secure https pages- Your site may have a www.example.com and a secure https://www.example.com. It is quite possible that both the versions are being indexed by the Search Engines.
- Homepage duplicates- This is when the root domain- www.example.com and the homepage URL- www.example.com/index.htm of the website are both getting indexed.
- Session IDs- This usually happens in e-commerce sites. These sites keep a tracking id to track the activity of the customers online. Many times the session id gets appended to the URL like- www.example.com/?session=1234598. This way you can create multiple duplicate pages. If you don’t solve this issue, all these pages will get indexed by the Search Engines.
- Affiliate Tracking- Just like session IDs when sites provide a tracking variable to their affiliates it leads to duplication of pages.
- Duplicate Paths- These usually get created in e-commerce sites that have the same product in more than one category. So, one product can be reached in more than one way. This causes duplication. Simple solution to this is to remove the ‘category’ from the URL.
- Functional Parameters- These are URL parameters which hold no value in search. For example, a printable version of an online page has its own URL and has no value in search but, this causes duplication.
- International Duplicates- This happens when you have content in the same language on the root domain, but for different countries. Example, www.example.com/us/product, www.example.com/uk/product or www.example.com/in/product.
- Search Sorts- These duplicates occur when a sort either ascending or descending creates a separate URL. The pages contain same content but in different order. It is a near duplicate.
- Search Filters- These are created to narrow down internal search within a site. E-commerce sites usually do this because they have a wide category of products.
- Search Pagination- This is caused when your internal search results are split into different pages. This problem is a bit difficult to solve.
- Product Variations- These pages crop up from one main page and only differ in a particular feature or option. For example, if a mobile is available on your site in four different colours. Then that product page gets further divided into the different colour categories causing near duplication as all the pages will have same content except the colour of the mobile will differ.
- GEO keyword variations- This happened when site developers created pages by mentioning location in the URL. This was done for Local SEO. All those pages were duplicate, having the same content. The URLs looked like- www.example.com/product/macbook-air/new-york or www.example.com/product/macbook-air/london.
- Other “Thin” content- ‘Thin’ is referred to content that is not unique. Either you have a lot of copied content or you have lots of ads and very less content, then your site is termed as being ‘thin’. Improve your content strategies to create unique valuable content.
- Syndicated content- This is the content that has been taken with your permission from your site and posted on another site. Even though you have given permission it still is duplication of content as two domains will have the same content. And will also be viewed as duplicate by the Search Engines.
- Scraped content- This is the same as syndicated content, the only difference being, you did not ask for permission before copying content from someone else’s site. This is unethical.
- Cross-ccTLD Duplicates- Even if you have different top level domains for various countries but, if you have the same content on every domain, it will lead to duplication.
The truth behind what causes duplicate content issues!
Let us admit it, duplicate content issues are something we face at some point or other. This does not make us copycats. Worry not, for duplicate content can be caused due to many reasons but it is most commonly caused due to technical ones. Technical problems usually happen at the developer’s end.
Some of the causes are-
- Misunderstanding the concept of URLs- Let us think that the website has a database. In the website’s database the same article is being retrieved by different URLs. This happens when the developer thinks that the unique identifier is the ID of the article in the websites database instead of the URL. For the Search Engine the unique identifier is the URL of the article. If this simple problem is sorted it would not cause content duplication issues.
- Session IDs- These are usually given by sites to track the visitors on the site. In case of e-commerce sites session IDs usually store the items the visitor might want to buy, in the shopping cart. It can also be referred to as a brief history of what the visitor did on your website. Since the session Id needs to be stored somewhere it is stored in cookies. Search Engines do not store cookies so, in some systems the session ids get appended to the URL. Since, the session id is unique to that session it creates a new URL, leading to the creation of duplicate content.
- Content Syndication- Many times other sites might copy content from your site without your consent and might not even cite your website as the source of that content. These practices confuse Search Engines as they find similar content on two domains.
- Comment Pagination- Wordpress and other systems have an option of comment pagination. This option duplicates content across article URLs. For example, Article URL + /comment-page-1/, /comment-page-2/, etc.
- Printer Friendly Pages- Websites sometimes create two versions of each page. One page has the content, ads, etc. This is the page we see online. The other Page is a printer friendly version of the same containing only the content. Unless we block the printer friendly page from being crawled Search Engines will find it. Two pages having the same content, Search Engines will regard them as duplicate.
- WWW vs Non-WWW- A website may have two versions a www and a non-www one or a http and a https one and if both are accessible to the Search Engines it creates duplication of content.
Ways to solve Duplicate Content:
There are several ways to solve duplicate content issues and ensure that your visitors see the content you want them to see.
- 301 redirect- Use 301 redirects in the .htaccess of the file. If you have restructured or redesigned your site you can direct Search Engines and visitors to the correct site by using a 301 redirect.
- 404 Error- The easiest way to deal with duplicate content is to remove that content and return with 404 error. If the content is of no value for the search or for visitors, it’s better to remove the content completely.
- Be consistent- When linking internally, keep your internal linking consistent. Link to any one of the versions and not to http://www.example.com/page and http://www.example.com/page/ and http://www.example.com/page/index.htm.
- Use top-level domains- Use top level domains to handle country specific content. Hence, use domains like www.example.in (for India) or www.example.au (for Australia). Do not use www.example.com/in or in.example.com.
- Syndicate carefully- If you syndicate your content on other sites you do not know which content Search Engines will think is most appropriate and will show on SERPs. Hence, it is advisable that you ask people who take your content to link it back to your content. You can also ask them to use the noindex meta tag for that content. This tag allows Google to crawl the content but prevents it from indexing it. This way yours is the only content that will show up on SERPs.
- Tell preferred domain- Through search console you can tell Google which is your preferred site, the one you want should be indexed. Whether you want http://www.example.com to be indexed or http://example.com.
- Placeholder pages- Do not publish pages on your site if you don’t have enough content. Users don’t like seeing empty pages. But, if you have to publish these pages it is advisable to use the noindex meta tag. This will prevent these pages from being indexed.
- Understand your site’s content management system- Many times sites, blogs or discussion forums display the same content. For example, you posted an article in the blog section of your site. It can also be seen on your homepage. This way the same content is present in two places on your site.
- Minimize similar content- If your site has similar content on many pages then try to merge all that into one page if you can. Think of ways to reduce pages with similar content.
- Rel=’canonical’- Use the rel canonical tag. The tag looks like-
<link href="http://www.example.com/canonical-version-of-page/" rel="canonical" />
It is used in the HTML code of a web page. This tag tells Search Engines that it should treat this URL as a copy of the main URL.
- Noindex follow meta robots tag- This tells Search Engines which pages they are not to index. It allows Search Engines to crawl the page and the links on the page, but prevents it from indexing it.
Did you know Duplicate content doesn’t hurt you, unless it is spammy.
Well, we are talking about that using Duplicate content will penalize you, will punish you, will harm you etc. But what Matt Cutts says for Duplicate content is that Google will not punish you until the content you have duplicated is filled with keyword stuffing or is a spammy one.
According to him,”I wouldn’t stress about this unless the content that you have duplicate sis spammy or keyword stuffing.”
The main reason to avoid using duplicate content is that Google doesn’t want to show repeated content to his users.
Tools which help to identify Duplicate Content:
By now, you must have got an idea about duplicate content. But, how will you know that your site is facing what kind of duplicate content issue? Here, are some tools to help you out-
- Google Webmaster Tools
Google Webmaster Tools track down any duplicate Title Tags and Meta Tags. In your Google Webmasters Account go to Diagnostics> HTML suggestions it will show you a table. From that table you can choose ‘duplicate title tags’ or ‘duplicate meta tags’ to a get a list of these. This tool does not detect all duplication errors, but, it sure does tell a few to begin with.
- Google’s “Site:” Command
This is a powerful tool that you can use to check if Google has indexed your site or not.
To check for homepage duplicates write-
site:example.com intitle:”Home Page Title”
Write this in the search bar on Google. Then check the search results you get. Leave the www out, this way the search results will show www and non-www versions.
To detect search sort problems you can use-
To find blocks of repeated content you can use-
site:example.com “this is a block of content”
In order to search for a particular keyword, say keyword x in all your URLs so use-
site:example.com intitle:”Keyword X”
Through these site commands Google will show all the sites that use that particular content.
- SEO MOZ Campaign Manager
This tool also shows ‘duplicate page titles’ and ‘duplicate page content’.
You don’t have to worry about duplicate content issues so much. It is better to be aware that your site is facing such issues and you should work towards correcting them. Search Engines are aware of the fact that there are duplicated content online. They know it happens, even if unknowingly, It is perfectly ok. Just fix the duplicate content issues as almost every site faces it at some point. Fixing might take some time, but, after it’s done, you’re good to go.
Search Engines want to show diverse results to the users. Search Engine just wants to know which page they should index and rank, and which they should not. There is no penalty as such for duplicate content but, there is heavy penalty for ‘THIN’ content. Just make sure, while fixing the errors for duplicate content, it should not be classified as ‘THIN’.