Keyword Density - More Than Meets the Eye:
--------------------------------------------------------------------------------
One of the standard elements of web page optimization is Keyword Density: up
until very recently the ratio of keywords to rest of body text was generally
deemed to be one of the most important factors employed by search engines to
determine a web site's ranking.
However, this basically linear approach is gradually changing now: as
mathematical linguistics and automatic content recognition technology
progresses, the major search engines are shifting their focus towards "theme"
biased algorithms that do not rely on analysis of individual web pages anymore
but, rather, will evaluate whole web sites to determine their topical focus or
"theme" and its relevance in relation to users' search requests.
This is not to say that keyword density is losing in importance, quite the
contrary. However, it is turning into a lot more complex technology than a
simple computation of word frequency per web page can handle.
Context analysis is now being determined by a number of auxiliary linguistic
disciplines and technology, for example:
semantic text analysis
textlexical database technology
distribution analysis of lexical components (such as nouns, adjectives, verbs)
evaluation of distance between semantic elements
AI and data mining technology based pattern recognition;
term vector database technology etc.
All these are now contributing to the increasing sophistication of the relevance
determination process. If you feel this is beginning to sound too much like
rocket science for comfort, you may not be very far from the truth: it seems
that the future of search engine optimization will be determined by what the
industry is fond to term the "word gurus".
A sound knowledge of fundamental linguist methodology plus more than a mere
smattering of statistical calculus will most probably be paramount to achieve
successful search engine rankings in the foreseeable future. Merely repeating
the well worn mantra "content is king!", as some of the lesser qualified SEO
professionals and very many amateurs are currently doing, may admittedly have a
welcome sedative effect by creating a feeling of fuzzy warmth and comfort. But
to all practical purposes it is tantamount to whistling in the dark and fails
miserably in doing justice to the overall complexity of the process involved.
t should be noted that we are talking presence AND future here: many of the
classical techniques of search engine optimization are still working more or
less successfully, but there is little doubt that they are rapidly losing their
cutting edge and will probably be as obsolete in a few months' time as
spamdexing or invisible text - both optimization techniques well worth their
while throughout the 90s - have become today.
So where does keyword density come into this equation? And how is it determined
anyway?
There's the rub: the term "keyword density" is by no means as objective and
clear-cut as many people (some SEO experts included) will have it! The reason
for this is the inherent structure of hypertext markup language (HTM) code: as
text content elements are embedded in clear text command tags governing display
and layout, it is not easy to determine what should or should not be factored
into any keyword density calculus
The matter is complicated further by the fact that the meta tags inside a HTML
page's header may contain keywords and description content: should these be
added to the total word count or not? Seeing that some search engines will
ignore meta tags altogether (e.g. Lycos, Excite and Fast/Alltheweb), whereas
others are still considering them (at least partially), it gets even more
confusing. What may qualify for a keyword density of 2% under one frame of
reference (e.g. including meta tags, graphics ALT tags, comment tags, etc.) may
easily be reduced to 1% or less under another.o-bidi-font-family: "Times New
Roman";"> Further questions arise: will meta tags following the Dublin
Convention ("D.C. tags") be counted in or not? And what about HTTP-EQUIV tags?
Would you really bet the ranch that TITLE tags in tables, forms or DIV elements
will be ignored? Etc., etc.
Another fundamental factor generating massive fuzziness left, right and center,
is the issue of semantic delimiters: what's a "word" and what isn't? Determining
a lexical unity (aka a "word") by punctuation is a common though pretty low tech
method which may lead to some rather unexpected results.
Say you are featuring an article by an author named "John Doe" who happens to
sport a master's degree in arts, commonly abbreviated as "M.A.". While most
algorithms will correctly count "John" and "Doe" as separate words, the "M.A."
string is quite another story. Some algorithms will actually count this for two
words ("M" and "A") because of the period (dot) is considered a delimiter -
whereas others (surprise!) will not. But how would you know which search engines
are handling it in which way? Answer: you don't, and that's exactly where the
problems start.
The only feasible approach to master this predicament is trial and error. The
typical beginner's inquiry "What's the best keyword density for AltaVista?",
understandable and basically rational as it may be, is best answered with the
fairly frustrating but ultimately precise: "It all depends - your mileage may
vary." It is only by experimenting with keyword densities under standardized,
comparable conditions yourself that you will be able to come to significant and
viable conclusions.
To get going, here are some links to pertinent programs that will help you
determine (and, in one case, even generate) keyword densities.
KeyWord Density Analyzer (KDA):
An all time classic of client based keyword density software is Roberto Grassi's
powerful KeyWord Density Analyzer (KDA).
Concordance:
Concordance is a powerful client based text analysis tool for making word lists
and concordances from electronic texts.
Fantomas keyMixer(TM):
Our own fantomas keyMixer(TM) is the world's first automatic keyword density
generator, enabling you to create web pages with ultra precise densities to the
first decimal digit.