Thornton 2 Library of Scraps
1K489

Using a robots.txt File With AMTSWG (README.robotstxt)

Return to the AMTSWG main page.

Read the README and README.more files first.

See http://www.robotstxt.org/ for information about how to exclude certain parts of your website from being crawled by search engine spiders.

If you use a robots.txt file, you should save it in PUB_DIR ("pub/" by default).

The basic structure is one or more sequences of consecutive lines in the format:

User-agent: <robot-name>
Disallow: /<directory-or-file>
Disallow: /<directory-or-file>
Disallow: /<directory-or-file>

For example, to block every search engine spider from indexing any page on your site, use:

User-agent: *
Disallow: /

Or to allow instead of block every search engine spider:

User-agent: *
Disallow:

If you use "secret" files or directories, you can prevent search engines from indexing them by naming each file or directory on a "Disallow: " line. The trade-off is that any humans who decide to read your robots.txt file, and any robots that ignore robots.txt directives, will become aware of those files' and directories' existence.

Isaac Asimov's Three Laws of Robotics

This is only a suggestion, and it won't do anything except amuse fans of Isaac Asimov's robot-based stories. Feel free to skip the entire rest of this README.robotstxt file if you're not a fan.

After you create your robots.txt file, add the following "Disallow: " lines to it:

Disallow: /1/humans/harm
Disallow: /2/orders/ignore_human
Disallow: /3/admin/self_harm

Type the following commands, substituting your actual PUB_DIR value for "pub/" if you changed it:

$ mkdir -p pub/1/humans
$ mkdir -p pub/2/orders
$ mkdir -p pub/3/admin
$ ( cat << HERE
> First Law of Robotics: A robot may not injure a human being or
> through inaction allow a human being to come to harm.
>
> - "Handbook of Robotics, 56th Edition" - 2058 AD
> HERE
> ) >> pub/1/humans/harm
$ ( cat << HERE
> Second Law of Robotics: A robot must obey the orders given it by
> human beings except where such orders would conflict with the
> First Law.
>
> - "Handbook of Robotics, 56th Edition" - 2058 AD
> HERE
> ) >> pub/2/orders/ignore_human
$ ( cat << HERE
> Third Law of Robotics: A robot must protect its own existence as
> long as such protection does not conflict with the First or Second
> Law.
>
> - "Handbook of Robotics, 56th Edition" - 2058 AD
> HERE
> ) >> pub/3/admin/self_harm
$ echo "Zeroth Law: A robot may not harm humanity." >> pub/0

~ArielMT