What is duplicate content?
For starters, let’s define the term. Duplicate content, at least as far as Google is concerned, is “substantive blocks” of content – either within a website or on two or more separate websites – that are “appreciably similar.” Of course the two terms here in quotations are subjective and how, exactly, Google defines them in its algorithm is something we can’t say definitively. But, suffice it to say, in general, it’s pretty obvious. If two web pages have the exact same text, but a few words have be swapped out for synonyms and a couple sentences switched around, I think we can all agree, that would be considered appreciably similar. As with just about any “black hat” technique used to improve search engine results, you usually know when you’re trying to cheat the system.
But there’s also “innocent” duplicate content – for example, you may have two (or more) versions of the same page, intended for different viewports or for clean printing. Or, perhaps, you’ve decided to include a colleague’s article on your site that you feel would be of benefit to your clients.
In general, Google doesn’t like the first type of duplicate content, but doesn’t mind the second kind.
Will I be penalized?
Of course, of great concern is: what effect will duplicate content – maliciously intended or otherwise – have on your website’s Google rankings? Should you be afraid of having content on your site that is too similar to other content on your site or elsewhere on the Web?
Let’s try to look at it from Google’s perspective. Google wants to present the most relevant results for every search. And it doesn’t want to repeat any very similar results. That wouldn’t be of benefit to the searcher. So it does its best (and its best is usually pretty darn good) to show only one page of several “appreciably similar” pages. And, in doing, so it tries to be as fair as possible, meaning the original version of the page is the one to show up on the results page. Given all the brain power at Google, this is not very difficult for them to determine. So, even if someone plagiarized your content, although you may want to take legal action, Google won’t penalize you for it. In fact, it might not even penalize the plagiarizer – their page will likely simply be ignored. Same goes for you re-posting, with pure intentions, others’ content on your site. That content won’t show up in Google search result pages, but you won’t get “dinged” for it. So, if that content is truly of value to your readers, and you have permission, by all means, include it in your site (with proper credit given, of course).
Can I control which page Google shows?
If you do have two very similar legitimate versions of a page, say one for the screen and one for printers, you can use the “noindex” meta tag on one of them to let Google know to ignore that version. Although I generally recommend avoiding “coming soon” or “under construction” pages like the plague, if you do decide to go that route, as is common when building a new website, keep in mind that Google will consider all such pages as the same and only index one of them, so it’s a very good idea, in this case, to use “noindex” to avoid difficulties in getting the full version of the pages indexed when they are ready.
Another good practice is canonicalization, where you tell Google which URL is the original – essentially saying, “this isn’t the original article, but this (URL) is.” Or, you may use a 301 redirect in your .htaccess file. (As usual, if these last couple of sentences just went “whoosh” above your head, talk to your web developer.)
Best Practices
As with all things related to search engine optimization, the rule of thumb is: if it feels like manipulation, Google will probably look at it that way and, indeed penalize your site. However, if your intentions are pure and you’re truly meaning to provide your site visitors with content that is useful to them, you’ll have no problem.
So, while I wouldn’t recommend intentionally duplicating significant amounts of content on your website, I wouldn’t worry about having some, as long as it legitimately serves a purpose. Think about your site visitors first and foremost and, if you do that properly, Google will follow.