|
|
||
![]() |
||
|
|||||||||||||
|
avoiding the index -
robots.txt - META robots tag
|
|||||||||||||
|
WARNING: The robots.txt file is not a shield against unauthorized entry. Please do not post material to the Internet that absolutely should not be seen by an unauthorized person. This standard is not backed by an official organization or covered by law. Keep in mind that while most major search engines respects this standard, the possibility exists that someone out there might simply choose to not follow the standard and access the file anyway. The
robots.txt file The "Robot Exclusion Standard," specifies a protocol for site administrators to direct the actions of so-called "robots" that crawl the Web and index Web sites. You do this with a small text file that you name "robots.txt" - a file that contains the instructions you post for visiting robots. You can exclude a particular crawler or all crawlers (that follow the standard) from your entire site, from particular directories, or from particular files. This file needs to be placed in the top level of your server's document space, so if your site is hosted at an ISP, you'll need to ask the ISP's webmaster for help with this. If you decide to use robot exclusion, keep in mind that Web server software often comes with a directory indexing feature. If your server has that feature, and it happens to be in effect, then any crawler that comes to your site could grab everything right out of the index, even if you had set up for robot exclusion. So the first thing you have to do is shut off the directory indexing feature. To exclude your site from all web crawlers, create a file named robots.txt that states:
To exclude just one crawler (e.g. Altavista's "Scooter") your file should read:
User-agent: scooter To limit the exclusion to a particular directory or file, put that address after Disallow: For instance,
User-agent: * In this example, two directory paths from the root server are excluded. You need a separate Disallow line for every path you want to exclude, and you may not have empty lines in the text file, as they are used to delimit multiple records. The "*" in the User-agent field is a special value meaning "any robot"; it cannot be used anywhere else in the record. To allow a single robot complete access and exclude all others:
User-agent: Lycos
Excluding search crawlers from specific files can give you a way to assert some control over the visitor's experience at your site. For instance, if you wanted to hold a trivia contest, you could put robot exclusion on the pages with the answers, so people wouldn't be able to find those pages randomly -- they'd only find the pages with the questions. One last point: not all robots adhere to the Robot Exclusion Standard, so if you have material you really want to keep away from all search engines, you should consider arranging for some kind of password protection. Additional Information
Meta Tag Placement
<html> Meta Tag Structure NOTE: The name of the tag and the content are not case sensitive. |
|||||||||||||
|
|
||
|
HOME | HELP Contact Us Terms of Service Privacy Policy © 1999-2007 PositionCare.com. All Rights Reserved. Hosted by: PowWeb.com |
||