Panda Cleanup: Hunting Down Duplicate Content

Ken Hardin

Ken-Hardin is an independent business analyst and management consultant at Clarity Answers.

Website publishers have had about three months to react to Google's "Panda" algorithm update, which primarily targeted "content farms," sites that the Web search giant has long aimed to squash from its results because it views their information as basically identical to that found on other sites, or just too sparse to be useful.

Google claimed that Panda, at least in its initial late February release, affected only about 2 percent of U.S. search queries. However, the dramatic hits taken by some well-established sites dominated headlines in the Search Engine sector for months after the rollout.


"When a client's site is pretty clean, it's been a pretty easy cleanup," said Dave Davies, the CEO of Beanstalk, an SEO consultancy based in Victoria, British Columbia, Canada. Davies did note that Panda has been "huge" in terms of impact, but sites are making a comeback. "Google is not really interested in punishing sites long term; it is just trying to protect its results," he said. "So, if you correct your issues, you can re-build your reputation."

Davies and other SEO and Search Engine Marketing (SEM) professionals who spoke to IT Business Edge suggested that if an otherwise reputable site took a hit after the Panda rollout, it probably was because it:


  1. Failed to take Google's advice over the last few years on technical housekeeping issues related to the way it identifies and filters out duplicate content from its results, or;
  2. Had not adequately invested in what Google would determine to be high-value content, or;
  3. Relied too heavily on just a few link-building tactics or the grey or black-hat margins of SEM, and suffered collateral damage as Google purged disreputable pages and the reputation they had previously passed on to their own domains.


"Google doesn't think like your business," said Matt Law, the founder of Law Marketing Systems, an Internet marketing and SEO consultancy based out of Orlando, Fla. "They think like a bunch of California yuppies who run the Internet. And they do."

In this first part of our two-part series, we'll look at the issues surrounding the "Panda penalty" for duplicate content and the best tactics for ensuring Google and other search engines identify your site pages as unique. In part two, we will look at the general guidance Google and SEO experts offer for making sure your content is valuable enough to rank highly in search results. We'll also take a look at how Google may be using user behavior and social interactivity to gauge that value.

So, what is the big deal about duplicate content, anyway?

Duplicate Content on Your Own Site

Google's ideal Web page is one that it determines to be both useful and unique; it appears only on your site and there only once.

However, dynamically generated websites tend to shuffle blocks of content to build pages for various purposes, such as site search results. Google's general guidelines have long advised webmasters to block search results pages from its indexing spiders. (Common wisdom is that among other issues, Google views landing on a page of search results at a destination site after clicking on one of its own search results to be a bad user experience.) Google also uses page metadata such as Title and Description to identify pages as being unique, and encourages webmasters to make sure each of their pages has distinct values.


When Google determines a page to be duplicate content, it basically nullifies its value, or reputation, in its index. It may effectively purge the page altogether, if it determines the page to be of negligible value or simply a duplication of another page it has already indexed.

If that page is on your site, all the "link juice" it would otherwise pass along to other pages that it links to will be gone, creating a chain reaction of sorts inside your site and potentially devaluing a lot of pages. Davies called it a "credibility loop," and it's from here that Panda's wide-ranging impact emanates.

More page types viewed as duplicate?

With Panda, experts agree that Google may have amped up its disdain for some page types.

Shimon Sandler, an independent consultant who focuses primarily on internal content quality and site optimization issues, said he has determined that Google may well be devaluing tag pages as being indicative of content farming. Tag pages -- essentially a type of search result that collects links to multiple content items based on topical tags -- have long been a staple of site navigation in widely used and generally respected content management systems such as WordPress (which Sandler uses to publish his own site, coincidentally).

Sandler noted that Google has long exhibited its dislike for user-generated tags, as evidenced by its move to remove them from YouTube after its acquisition of the user-submitted video site in 2006. It generally views them as ripe for spam, he said, and Google hates spam. However, tag pages created by site operators as a navigation metaphor probably aren't going away anytime soon.

Sandler advises that tag pages should be clearly identified to Google, but declined to share his specific tactics for suppressing or re-directing them, saying half-jokingly, "Give them a link to my site and have them call me."


Next page: Canonical Tags


Add Comment      Leave a comment on this blog post

Dec 20, 2012 8:14 AM  says:
What the best tip you can give to stay in SERPs top? Reply
Jun 13, 2013 2:11 PM rankingquest rankingquest  says:
sometimes i tear my hair out explaining this to clients! Reply

Post a comment





(Maximum characters: 1200). You have 1200 characters left.



Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.


Resource centers

Business Intelligence

Business performance information for strategic and operational decision-making


SOA uses interoperable services grouped around business processes to ease data integration

Data Warehousing

Data warehousing helps companies make sense of their operational data