March 10, 2009

Unless you’ve taken careful consideration to handle duplicate content issues with your eCommerce web site or Content Management System (CMS), Search Engines (SE) will see multiple versions of the exact same content – even though you may have entered/submitted the content only once. Often this is referred to as the “canonicalization” issue. Initially, this might not seem like such a bad problem, right? More content from your site indexed in search engines, but…

Unfortunately, for business owners, duplicate content issues are not good for your Search Engine Optimization (SEO) strategy. What happens in this case is that your SE credit for this content is lowered when spread out on different URLs. It would be a better SEO strategy to ensure the search engine indexed a single version of the content, thus giving as much credit to this content as possible.

Why do search engines not like to see duplicate content on your site? Likely because search engine spiders (crawlers) spend extra resources looking at the same content on your site, and also because search engines do not like the same content showing up multiple times in the search engine results pages (SERPs).

5 reasons search engines see duplicate content on your CMS or eCommerce web site:

1) Duplications Due to Sorting Values in Querystrings

The first reason is the use of certain types of sorting values in querystring parameters:
e.g., http://www.example.com/index.php?catid=2&sort=price)

In this case, the sort parameter in the querystring is “price”, and there would likely be a links on the page to other sorting options, such as “name”, “date”, etc. So, while the page would have essentially the same content for visitors, the various sorting options may cause search engines to see multiple versions of the same content, because URLs would be different. Read on to learn techniques to fix this problem.

Important: Search engines index the content found at individual URLs, not web site “pages” or “files”.

2) Inconsistent Links to Your Site

In general, obtain as many quality links from reputable sources related to your industry as possible. While you cannot help how people link to you, a second reason why search engines see duplicate content is due to the way people provide links to your site. Take a look at the following URLs:

http://www.example.com

https://www.example.com

http://www.example.com/index.htm

https://www.example.com/index.htm

http://example.com

https://example.com

http://example.com/index.htm

https://example.com/index.htm

While all these URLs may show the exact same information to your visitors (you might even consider them the same “page”), to a search engine these represent different indexable URLs. Even if people use UPPERCASE and/or lowercase in their links to your site (or with or without the “www.”) then these links may end up diluting your SE credit for the content, depending on the search engine. More on how to solve this problem below.

3) Duplications From Session IDs in Querystrings

Session IDs in URL querystrings are a third major reason for duplicate content issues on your web site:
e.g., http://www.example.com/index.php?sessionid=abcdefghijkl12345

Session IDs tell a web server that the page is being used by a particular visitor. They are used for keeping information about the visitors web site visit and are be extremely useful (and often necessary) in eCommerce web sites. But again, remember that search engines index URLs not “pages” or “files”, so if there are links to URLs with session ids, these links may cause the SE to index multiple versions of the same content. Renowned Google guru Matt Cutts recounted a story where a single privacy policy was indexed several thousand times due to session IDs in URLs.

4) Duplications Due to Printer-friendly Pages

Printer-friendly pages are also a source of duplicate content on your site. Many eCommerce and CMS web sites offer a printer-friendly version of a page, which contains all the same content of the regular page but in a format suitable for physical printing. The best way to ensure search engines do not see this as duplicate content with these pages would be to:

  • Use a meta “noindex” tag in the head section of the printer-friendly page
  • Put the specific URL in the site’s robots.txt as disallowed
  • Put rel=”nofollow” into the link pointing to the printer-friendly page

These directives tell the search engine not to index the content on the printer-friendly page.

5) Duplicated Product Information

Particularly for eCommerce web sites, if you resell another manufacturer’s product it is often the fast and easy to copy the manufacturer’s product descriptions verbatim to put on your website. While doing this allows you to get your products online faster, it will not help your overall SEO strategy as much as you would like it to.

Take the time to modify product descriptions, even slightly will help. Do a search for the products on Google and see what other resellers of the same product have written for the description. For best results as an online reseller, make sure your product descriptions are different from competitors selling the same items online.

How to fix these duplicate content issues?

Thankfully, Matt Cutts has some great insight for all web masters on this topic. As an eCommerce consultant and SEO professional, I would advise interested web masters to take a look at Matt’s video for a thorough analysis of the duplicate content problem and resolution.

From our experience, here is a summary of solutions for various duplicate content issues on your site:

  • Custom programming to change your CMS or eCommerce software to generate only the URLs you want. Normalize the URLs on your site;
  • Pick one “canonical” URL and ensure you link consistently within your site (especially in your sites navigation and internal links);
  • Implement the canonical link element for you pages:
    <link rel=”canonical” href=”http://www.example.com/product.php?item=favorite-items”/>
  • Make all non-canonical URLs do a permanent (301) HTTP redirect to the preferred URL. Use server-side programming to ensure the 301 redirect is implemented correctly. Use an HTTP debugging program like Fiddler to confirm the 301;
  • Use Google’s Webmaster Tools to specify “www.” or no “www.” on your site;
  • Submit XML sitemaps to Google to break ties in preferred URLs. Particularly important for solving item #1 above, if Google sees multiple versions of the same content at URLs differing by only a few querystring parameters, the XML sitemap is used to break a tie and Google will [supposedly] index the link in your XML sitemap rather than the link found on your site.

A final note:

Exercise caution when dealing with your web server and any server-side programming. Invite a professional team to look over your duplicate content problems and advise you on the best way to solve them so you can get the best results for you content in search engines. For more information on solving duplicate content issues with your website, we invite you to Contact Us.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • LinkedIn
  • Technorati
  • Slashdot
  • Yahoo! Buzz
  • Google Bookmarks
  • Ping.fm
  • Reddit
  • StumbleUpon
  • Suggest to Techmeme via Twitter
  • Twitter

Tags: , , ,


One Response to “5 Duplicate Content Issues Common to eCommerce Sites”

  1. John G. Says:

    I wrote a similar article about this subject but you did a better job :)


Leave a Reply