Ever heard of duplicate content? If you've kept up on the scrappy marketing posts, then you've probably read about duplicate content somewhere.
I came across a recent video post by Matt Cutts from Google that addresses the question “Will eating the same sandwich everyday cause Google to give my website a duplicate content penalty.” It was after seeing this video that I thought to myself “not only can I ride on his coat-tails but it would be great if this concept were explained further.”
In this video, Cutts mentions the canonical link element. This is a new element you can add to a web page to help search engines better understand your page. Before I address the new canonical link element, I figured this was a good chance to write about duplicate content, and what it is.
According to Matt, there's no connection between sandwiches and duplicate content. This is attributed to Google, not having reached sandwich space yet. Currently they're still most predominately in web-space (his joke, not mine).
What is duplicate content?
My definition of the duplicate content is “identical content, accessible by more than one URL.” This version of duplicate content usually takes the form of a blog post or some other type of web content that is posted across multiple web pages or sites.
When does this happen?
There's typically two major occurrences to consider:
- Duplicate content occurring within one website.
- Duplicate content occurring across multiple websites.
In either instance, identical content is found by search engines in multiple locations.
What happens if search engines find duplicate content?
If a search engine finds duplicate content, they must make a decision as to which sites or pages are the origin, and which are not. Usually they get it right, but from experience, they can get it wrong.
If I get caught, am I going to search engine hell?
Luckily for webmasters everywhere, search engines will filter the pages duplicated pages versus penalizing them. The filter will even be removed once you fix the duplicated problem.
How do I avoid the duplicate content filter?
The best way to avoid a duplicate content filter is to use a robots.txt file to disallow search engine spiders from crawling pages that you know are duplicated. When setting up a WordPress site, it's important that you have your robots.txt file tuned correctly. With WordPress, often times there are multiple URLs which can lead to the same posts or content. Fine tuning your robots.txt page is a great way to avoid any possible issues.
Need a robots.txt for your WordPress site?
Just copy the below text, and place it in a file named robots.txt. If you already have a robots.txt file, just append the below text and re-save it.
User-agent: * Disallow: /wp- Disallow: /search Disallow: /feed Disallow: /comments/feed Disallow: /feed/$ Disallow: /*/feed/$ Disallow: /*/feed/rss/$ Disallow: /*/trackback/$ Disallow: /*/*/feed/$ Disallow: /*/*/feed/rss/$ Disallow: /*/*/trackback/$ Disallow: /*/*/*/feed/$ Disallow: /*/*/*/feed/rss/$ Disallow: /*/*/*/trackback/$
Ben Herman is the founder of Mad Fish SEO, an interactive search engine marketing and web development agency based in Portland, Oregon. Mad Fish SEO believes that clients set marketing expectations, but it’s Mad Fish’s job to exceed them.
tricky trick with your links in your bio Ben...
Mr Lloyd? Is Mr. Herman robbing ScrappyMarketing of its non-existent link juice?
Well there goes that! I was trying to get some of that link juice!