|
|
|
| A
D V E R T I S E M E N T |
Search engines don't index everything. In fact, features that
Web designers add to sites at great expense may block crawlers, meaning
that those pages will never be indexed and never be found through search
engines. As a result, those sites may end up spending far more on promotion
than they would have had to otherwise.
By paying attention to how crawlers and search engines work, you can
get more traffic at far less cost.
Tips:
- Sites that require any kind of registration or password lock
out search engines. Keep in mind that a web crawler cannot fill out
a form of any kind. If you need to fill out a form to get to the next
page, the crawler halts right there. If you would like to gather information
about your users or members, but would also like your pages to be
indexed, make the registration optional.
- A crawler cannot get content from a database, because it
cannot fill out a form.
- If the content of your database is largely text, you might consider
creating plain-text static HTML pages with that same content,
so they can be indexed and found.
- Dynamic pages also block Web crawlers. While it's great to
give visitors unique experiences, tailored to their needs, the techniques
you use to do that could stop search engines from indexing your content
and hence could greatly reduce your potential traffic. Dynamically
generated pages are created on the fly from a variety of elements
held in databases. Typically such pages have a question mark (?) in
the URL. When a search engine crawler arrives at such a page, it captures
the content but halts immediately, and will not follow the links,
because it sees ahead of it an infinite number of pages -- a black
hole that would bring it to a crash.
- Active Server Pages (.asp) with question marks in their URLs
(indicating that the page is a script for the construction of a page,
rather than just static content) are not indexed.
By the way, this is one reason why nobody can say how many pages there
are on the Web, total. Every dynamic site has potentially an infinite
number of pages. And how many millions of dynamic sites are there?
- If you have information inside frames, that will probably
prove to be a hindrance, but is not an absolute barrier. Search engines
index the outside of the frame as a distinct page. It will also index
each pane of the frame window as a separate page. That means that
if the content matching a query is in a pane, when visitors clicking
on those links will see only the pane, not the full page as it was
originally designed. So if you want visitors from search engines to
experience your pages in a certain way, you should have non-frames
as well as frames versions of those pages, and submit the
non-frames versions.
- Crawlers also can't index text that is embedded in graphics.
Have you ever been to a site that has a huge picture that takes minutes
to paint across your screen, with all the words embedded in that picture?
Search engines simply cannot "see" the text unless the webmaster puts
ALT text behind the picture, describing it and listing those
important words.
- Text that appears in multimedia files (audio and video) cannot
be indexed.
- Information that is generated by Java applets or in XML coding
cannot be indexed.
- Acrobat files cannot be indexed either. If you need to be
found, you should provide plain HTML versions of those pages
and point the crawler to those.
- Comments, that is, text between <!-- and --> symbols
in the source code, aren't indexed by all engines.
- Also, consider technical factors. If a site has a slow connection
or the pages are very complex, it might time out before the crawler
can index all the text.
- If you have a hierarchy of directories at your site, put the most
important information high, not deep. Search engines will presume
that information placed higher is more important. And crawlers
may not venture deeper than three, four, or five directory levels.
- In addition, it helps to have a central page with good navigation
to the other pages at your site. Make it easy, not hard, for the crawler
to find all your pages by following internal links.
- Your rule of thumb should be to have at least one full set of your
content available in a form that the blind can read. The blind are
some of the best users of the Internet today. They use text-only browsers
and text-to-voice converters, and they are able to navigate very well
unless people put up barriers. The same kinds of barriers that
stop the blind also stop Web crawlers. Label pictures clearly
with ALT text in the background, to explain what a sighted person
would see. And by designing your site to accommodate the needs of
search engine crawlers you will also probably make sure that your
site complies with the provisions and the spirit of the Americans
with Disabilities Act. Check your sites for access with
Watchfire WebXACT.
|