By
Danny Sullivan, Editor
February 4, 2003
The Search Engine
Report, Feb. 4, 2003
Few issues have divided the search engine marketing
community than that of cloaking. There is a segment that firmly believes
they should have the right to cloak their content from users, while another
group strongly feels this is a deceptive tactic. Get the two talking about
it and tempers flare.
Complicating the issue is the fact that while some
search engines have guidelines against cloaking, arguments can also be made
that these same search engines still allow it or even themselves practice
it. In addition, just trying to agree on "what is cloaking" can lead to
frustration on both sides.
Search engine marketer Alan Perkins hoped to clarify
matters by publishing his
Cloaking Is Always A Bad Idea article last month. Instead, the article
has renewed the debate over cloaking, but perhaps in a helpful matter.
Below, we take a look at why people have traditionally
cloaked, how XML feeds these days provide a form of approved cloaking and
why the bigger issue to focus on isn't whether cloaking is allowed but
instead whether paid content gets more liberal rules about acceptability.
Do Things For
Humans, Not Search Engines
Perkins is a firm believer in doing things for humans,
rather than search engines. His view is that people should make good
content, then only tweak it to the degree that search engines have commonly
encouraged, such as writing good title tags or making pages easily found via
site maps.
"Suppose search engines did not exist. Would the
technique still be used in the same way?" he asks, in his
The Classification Of Search Engine Spam white paper. It's generally
good advice that I agree with. I have always strongly recommended that
people create excellent content, then do the small, simple things that can
improve rankings. However, Perkins's guidelines can be too restrictive in
requiring that people make search engine-specific changes only if they have
"written permission" (assumedly through published guidelines).
For example, Google recently updated its
webmaster
guidelines and offers a lot of great, practical advice there. This
doesn't mean that everything is covered, however. For instance, let's say
you had a very long page about buying a used car. One section might deal
with places that sell used cars, while the second covers negotiating a deal
and a third section discusses how to inspect a car properly before buying.
For search engine purposes, it would be wise to break that page up into
three different pages, so that each of the new pages is more firmly focused
on one of these subtopics.
Such an action wouldn't be done primarily for humans.
Actually, humans might prefer one big page. However, it's such a subtle
change that I doubt any search engine, including Google, would fault you for
doing it. I think the "spirit of the basic principles" that Google describes
in its guidelines is still being followed. You made good content not
specifically for Google but instead for humans. While breaking up the page
is something you specifically may have done to help Google index and rank
the content, it remains primarily good content that you are offering to
users.
Not Everyone
Agrees With The Search Engines
The "Would I do this if search engines didn't
exist?" question that Google suggests you ask yourself in
its guidelines is excellent advice and clearly follows on what Perkins
preaches. It's also the same advice that generally any search engine would
tell you, when discussing getting listed outside their paid inclusion
programs. But how does all this get back to the debate on cloaking?
Well, not everyone agrees with Perkins. Indeed, not
everyone agrees with Google or the other search engines, when they issue
guidelines. There's always someone who feels they have a situation that
justifies doing something specifically for the search engines, rather than
humans.
Sometimes, the justification is the technical
limitations of search engines, such as:
- You can't read my Flash content, so I'm building a
page just for you to index.
- I have a dynamic web site that you refuse to index,
so I'm creating a static page that describes each of my products.
Other times, it's simply that the marketer believes
its a jungle out there, so they'll do whatever they feel is best to compete.
"I know people are building pages specifically to please your algorithm and
getting away with it, so I'm going to do the same!" It's also common that
they know they may get caught, perhaps banned, but decide that's a risk
they'll take.
Doing The
Doorway Page Dance
Those in the "do whatever it takes" camp tend to be
practitioners of what are called "doorway pages." They also have been known
by other names, and you can read my older article,
What Are
Doorway Pages, to learn some of their other names and more about them.
However, the idea behind doorway pages is simple. You make a page targeting
a particular search term, tweaking the title tag, the meta tags and the body
copy in a way that you hope pleases the search engine's algorithm.
The result is often a very ugly page that you'd never
want a human being to see. For instance, I recently needed to find some
information about the movie "Thomas and The Magic Railroad" for my son, and
I needed to go to someplace other than the official fan site. So at Google,
I searched for "thomas and the magic railroad fan site." The last of the top
pages listed was this:
Come here for thomas kinkaid
... kinkaid - the princess bride movie three six mafia mp3
the offspring music
three six mafia photos the oreilly factor show thomas magic railroad three
dog night ...
See the description? The text is nonsensical. This
page simply has a bunch of words on it. The person who created it simply
hopes that some of the words will somehow form a match that pleases Google.
It's also not a sophisticated doorway page attempt, but it worked for this
extremely long query, that I did at Google.
Doorway pages were a popular tactic in the late 1990s,
but they've declined for several reasons. Better use of link analysis is a
key factor. Google, along with all the other major crawlers now, needn't
depend just on a page's content to know what it is about. They can analyze
links to understand both the content and popularity of pages. This means for
popular queries, doorway pages have a much harder time succeeding than in
the past, outside of paid inclusion programs.
In addition, the emergence of paid placement programs
have also had an impact on running doorways. Lots of effort can be put into
doorway pages, yet they offer no guarantee of ranking well. When paid
placement came along, it provided the guarantee of top rankings. That comes
at a price, but the price may be well worth paying, when measured up against
time spent on doorways and uncertainties.
Finally, paid inclusion has provided a solution to
some of the reasons that doorway pages were initially deployed. For example,
those with dynamic content that might ordinarily be missed by some crawlers
can now use paid inclusion programs as a way to get indexed without going
the doorway route.
Bring On The
Cloaking
While doorway pages as traditionally done are in
decline, nevertheless, they still exist. The chief problem with them also
remains: they aren't content that you want users to see. In the example
above, imagine your reaction in coming to the page with a ton of nonsensical
content. You'd go away. This is why doorway use is often accompanied by
cloaking.
When cloaking, you show the search engine something
different than what you show a user. There are many ways to cloak, but those
who are serious about it typically do what's called IP cloaking. This means
that you know all the internet addresses that the major search engines
spiders use when they access the web, their "internet protocol" addresses.
That's the IP in IP cloaking. If you see a request come from one of these
known addresses, then you deliver your custom content. Meanwhile, a human
user sees something different.
The page in the example above used cloaking. When I
examined it, the content wasn't nonsensical. Instead, I got a simple page,
easily readable, with two links that lead me to get product information
about Christian artist Thomas Kinkade at other web sites. The person behind
this page no doubt earns affiliate fees from clicks off this page. No doubt,
the page will also be removed by Google soon. Google has a specific ban
against cloaking and may take action against pages doing this.
Cloaking Does
Not Equal Spam
Hopefully, I've by now explained two completely
different tactics in search engine marketing: doorway pages and cloaking.
The two are not the same, though they often go hand-in-hand. Doorway pages
are the effort to "crack" or "please" a search engine's algorithm. In
contrast, cloaking does nothing to please an algorithm. It's merely a way of
delivering targeted content.
This is an important distinction to make, because some
people like Perkins want to declare that cloaking is automatically equal to
spamming search engines. To me, that's not necessarily the case.
Spam is often cloaked, absolutely. Google certainly
considers cloaking to be spam. Both Inktomi and Teoma have guidelines
against it, as well. However, as well see, I'd argue that they allow
cloaking via their paid inclusion programs. Meanwhile, FAST and AltaVista
actually have no written guidelines that I can find against cloaking.
Finally, and this is the most important point, by
declaring cloaking to automatically be spam, Perkins leaves himself open to
the pro-cloaking arguments he most wants to stop, and stop with good reason.
Everyone
Cloaks!
Perkins wants to define cloaking in a technical
manner: "If you need to know a search engine's IP address or some details
from its HTTP request (e.g., its user agent name) in order to deliver
content, you are probably cloaking. If you don't need that information, then
you are certainly not cloaking."
The problem is, not everyone agrees with Perkins'
definition. For example, those defending cloaking like to talk about the
fact that in some countries, if Google detects you are outside the US when
trying to reach Google.com, it will redirect you to your "local" site.
Personally, I wouldn't define that as cloaking, since
I've never seen the case where Google has shown someone something different
on the Google.com home page, depending on their country -- and I've gone to
the Google.com from a variety of different countries. If something happens,
it usually is that you try to reach Google.com and instead get redirected to
a completely different, non-Google.com web site.
A better example for those who want to say that Google
cloaks might be when you do a search there. If I search at Google.com from
where I live in the UK, I get ads targeted to those in the UK. That's a
completely different experience than what a user in the US would get, yet
we'd view the same URL.
An even better "everyone cloaks" argument is that
cloaking is even built into some web server software. For instance, let's
say you build a web site with three different versions of pages, a text-only
version, a version for those using Internet Explorer and one for those using
Netscape's browser.
Your web server allows you to target all those people
who have IE or Netscape and show them custom versions. Everyone else gets
the text-only version. So, a search engine spider coming to the site sees
the text-only content, which is different than what the vast majority of
your users see. You haven't even actively set up your web server to do this,
but it happens -- and could be considered cloaking.
Coincidentally, the same time Perkins posted his
article, WebmasterWorld.com owner Brett Tabke posted his own thoughts on
mainstream cloaking.
That forum thread starts out with more examples of cloaking being argued as
commonplace.
"Everyone cloaks" arguments can infuriate the
anti-cloaking crowd. They find the arguments merely an attempt to confuse
people about what "real" cloaking is, and I've literally seen people turn
red trying to push back acceptance of these other examples as cloaking.
There's definitely truth in what the anti-cloakers
say. Some search engine marketers definitely employ the "everyone cloaks"
defense as a means to get clients to sign-on to potentially risky campaigns,
which is the most worrying issue to me. But others have real differences of
opinions as to what cloaking is. In my view, coming up with a definition
that accommodates them, as well as the anti-cloakers, is the only way
forward on this issue.
Cloaking
Doesn't Kill Search Engines; Spam Kills Search Engines
My solution, I hope, is simple. I suggest that we
define cloaking not by technical terms but instead by the end result:
"Cloaking is getting a search engine to record content for a URL that is
different than what a searcher will ultimately see, often intentionally."
Unlike Perkins, I don't care
how the cloaking is done technically. Whether it is by user agent detection,
IP detection, "poor man's cloaking" by placing content within a noframes
area, hiding content with layers using cascading style sheets or whatever.
If the typical searcher sees something different than the content of the
page recorded in the search engine's index, that's cloaking. This also fits
in with the guidelines we do have from three of the crawler-based search
engines that offer them:
-
GOOGLE:
The term "cloaking" is used to describe a website that returns
altered webpages to search engines crawling the site. In other words, the
webserver is programmed to return different content to Google than it
returns to regular users, usually in an attempt to distort search engine
rankings.
-
INKTOMI: Pages that give the search engine a different page than the
public sees (cloaking).
-
TEOMA:
Web pages that show different content than the spidered pages.
Of these, only Google suggest a technical definition
to cloaking with its statement about a webserver being "programmed" to
deliver custom content. I'm being broader than this, but I also think that
fits in well with Google's other guidelines in general that warn against
hiding information from users.
Indeed, "cloaking is hiding," summarizes Jill Whalen,
the search engine marketer who originally published Perkins's article in her
popular High Rankings
Advisor newsletter, then who diligently followed the debate that broke
out at the ihelpyou forums (Why
Cloaking Is Always A Bad Idea) and WebmasterWorld.com
(Cloaking
Gone Mainstream). Both threads provide excellent
views on this subject.
Yes, exactly that. Cloaking is hiding. Even hiding
text by making it the same color as the background of a web page ("invisible
text") is a form of cloaking. Low tech, but cloaking all the same.
Another crucial difference between my definition and
that of Perkins is that I do not automatically declare cloaking to be spam.
This is an important distinction, if the goal is to help educate people
about the potential problems associated with cloaking.
It's also important, because even though Perkins says
in his article that all the search engines say "don't cloak," as I've
written, AltaVista and FAST don't actually say this at all, in their
webmaster guidelines. In addition, both of them as well as Inktomi and Teoma
arguably allow cloaking via XML feeds, as I'll conclude.
Even Google, despite its ban, might be considered to
allow cloaking when some of the "everyone cloaks" examples are employed. Of
course, anyone who thinks such arguments will protect them from Google, if
caught cloaking, is more than likely to lose the battle. But, we'll come
back to this.
To Win For
Free, Focus On Content
It bears repeating. Cloaking often goes hand-in-hand
with low-quality doorway pages, which search engines often consider spam. If
you are considering cloaking, it is probably because you are creating
content that you hope will please a search engine's algorithm, rather than
content that should exist primarily to please human visitors. Such efforts
can often be time-consuming, not yield the expected results and may only
work for a short time.
I've sometimes used a bicycle metaphor to explain
this. Those who create doorway pages are like someone who jumps on a bike,
sprints forward and leaves you, the quality content builder, behind.
Eventually the sprinter tires. You overtake them, without having to do any
additional effort. In addition, sometimes the "sprinter" never even
overtakes you in the first place.
OK, so it's also the tortoise-and-hare story told
again. However, it remains true. All the search engines reward good content.
This is especially because good content attracts those crucial links that
everyone wants. Focus on good content, and when it comes to getting listed
"for free" in the editorial results of the major crawlers, you are playing
the smart, long-term game.
Approved
Cloaking & XML Feeds
Things are different when it comes to paid inclusion,
which all the major crawlers but Google offer. In particular, all the paid
inclusion crawlers have ways for content providers to "feed" them
information via XML.
To understand this process, picture a spreadsheet that
has all the URLs you want listed, row by row. Information about each URL is
listed in the columns -- the title of each URL in the first column, the
description of each URL in the next column, the "body copy" of each URL in
next column and so on. It's not really web pages that are read but tabular
information about URLs that is pumped into the search engines.
To me, XML feeds are a form of approved cloaking.
That's not why most people use them, nor should that be the main reason you
consider them. XML feeds really were not initially intended to be a new way
for marketers to cloak low-content doorway pages but rather a simple way of
feeding in dynamic content such as a product database. If you are an online
merchant, XML feeds make a lot of sense to consider.
Having said all those disclaimers, there's no doubt
that some people are indeed using XML feeds as a way to cloak doorway pages.
Moreover, they have the approval of the search engines, since these feeds
are reviewed by the search engines for quality. In addition, there's
evidence that being in these programs may help such content compete better
for rankings than if it were picked up for "free."
In November, I looked at this
situation with AltaVista (and if you are really interested, Search
Engine Watch members got a much more
detailed look). This month, for Search Engine Watch
members, I look more closely at the
situation with Inktomi. As with AltaVista, there's evidence that XML
feeds have been an effective way for some companies to feed and cloak
content that might not otherwise have met Inktomi's content guidelines.
Inktomi admits that its XML feeds do technically
violate its posted guidelines about cloaking and says its now looking to
amend these. However, that's not really the key concern. Instead, the real
issue is that XML feeds and perhaps paid inclusion in general is allowing
some people to provide content in a radically different way than has been
generally accepted when content is gathered for free.
In particular, promotion that people have done in the
past via traditional doorway pages and cloaking -- and have been banned for
-- now can now be done under the guise of content feeding, with the search
engines that offer this. That's why I feel it's almost naive to be arguing
about whether cloaking is an acceptable delivery mechanism these days,
except in the case of Google.
For the others, the important issue revolves around
content standards. If low-content doorway pages are not acceptable editorial
content when found by a search engine spider naturally, should they suddenly
be OK when read via paid inclusion programs? If there's a debate to be
having, this is it.
Avoiding
Trouble With Cloaking
As said, to me XML feeds are a form of approved
cloaking. I suspect some search engines also may allow some ordinary HTML
pages to be cloaked via their non-XML paid inclusion options, as well --
something I hope to clarify in the future.
Also as said, some may argue that my broad definition
of cloaking means that Google might knowingly allow it, in some cases. For
example, it's possible that the site throwing out text-only pages might get
banned by Google for "accidentally" cloaking, then upon review might have
that penalty lifted.
Ah, ha! Proof that Google has allowed cloaking. If so,
so what? Google clearly reserves the right to do whatever it wants when it
comes to cloaking, when it warns that those who cloaked "may" get
permanently banned, rather than say "will" get banned.
Maybe Google has let a site "technically" cloak or
perhaps even overtly cloak, for some reason. Banking on that individual
decision to defend yourself if you actively cloak against Google is just
foolish. Instead, I would say most people who choose to show Google cloaked
content do so knowing that they may get caught and tossed out.
Overall, I'll leave you with my definition of
cloaking, backed up by some additional guidelines that I think will steer
you away from trouble:
"Cloaking is getting a search engine to record content for a URL that is
different than what a searcher will ultimately see, often intentionally.
It can be done in many technical ways. Several search engines have explicit
bans against unapproved cloaking, of which Google is the most notable one.
Some people cloak without approval and never have problems. Some even may
cloak accidentally. However, if you cloak intentionally without approval --
and if you deliver content to a search engine that is substantially
different from what a search engine records -- then you stand a much larger
chance of being penalized by search engines with penalties against
unapproved cloaking. If in doubt, ask the search engine if it has a problem
with what you intend to do, assuming you can't get a clear answer from
written guidelines that are provided. If you are working with a third-party
search engine marketer, ask them for proof that what they intend to do is
approved. Otherwise, be prepared for any adverse consequences."
I'd like to say all the search engines will promptly
respond if asked, but they probably won't, except to those in paid inclusion
programs. Still, if you've asked and ended up in trouble, then you can at
least show you tried to get clarification. If you aren't an "industrial
strength" cloaker, that may help.
As for working with third-party firms, understand what
they are doing for you. Ask to know if there are any potential risks and get
this spelled out in advance. If you aren't comfortable, walk away.
Someone who's going to engage in unapproved cloaking
and who is professional will tell you the risks and not try to make you
think that cloaking content is something "everyone does." Instead, they'll
explain why they do it, why they think it works and what the possible
downsides will be. They'll do this because they often work with clients
prepared to take those risks, so they aren't interested in trying to
disguise what they are doing.
2003: The
Year Of Paid Inclusion
Let me conclude by going back to what I said was the
real issue in this debate, that content standards seem to have changed, as
most crawlers have become dependent on paid inclusion as a revenue
generator.
The standards are for the search engines to change, of
course. Nor does having different standards -- perhaps more liberal
standards -- for paid content necessarily mean that users or relevancy is
harmed. However, it does create confusion and concerns.
For paid inclusion to succeed, we're going to need the
providers to be much clearer about exactly what benefits and advantages are
provided over unpaid content. That's going to help search engine marketers
trying to make purchasing decisions, as well as users evaluating the results
they receive.
I also expect that paid inclusion content will
ultimately need to be segregated from unpaid content, the more that content
guidelines diverge. As I wrote in my previous article about
issues
with paid inclusion at AltaVista, such segregation may have positive
benefits for both search engine marketers and users.
If the search engines fail to do this voluntarily, I
think it's likely we'll see a third party such as the US Federal Trade
Commission suggest it happen. In 2002, the
FTC told
those carrying paid placement listings to clean up their acts. In 2003, the
agency's aim may shift to issuing new, stricter guidelines about paid
inclusion listings. |