Robots

mnoGoSearch 3.4.1 reference manual: Full-featured search engine software
Prev		Next

mnoGoSearch 3.4.1 reference manual: Full-featured search engine software

Robots

Name

Robots -- defines whether to respect robots.txt and robot directives (in HTTP headers, meta tags, link attributes).

indexer.conf

Synopsis

Robots {yes | no | robotstxt | xrobotstag | meta | rel}...

Description

Robots defines which robot directives should be respected when crawling:

robotstxt - respect the Robots exclusion standard (robots.txt files).
xrobotstag - respect the X-Robots-Tag HTTP header.

meta - respect HTML robots meta tags, e.g.:

<META NAME="robots" CONTENT="noindex,nofollow,noarchive">

rel - respect the rel attributes in HTML links, e.g.:
```
<a href="http://www.site.com/" rel="nofollow">
```
yes - respect all robot directives.
no - ignore all robot directives.

Setting Robots to no can be useful when running mnoGoSearch for site validation purposes, as well as when crawling your own Web site.

The default value is yes, for polite crawling purposes.

Scope

Robots can be used multiple times and affects on all following Server and Realm commands until the end of the configuration file, or until the next Robots command.

Examples

# Respect all directives except the rel attribute
Robots robotstxt xrobotstag meta

# Ignore all robot directives
Robots no

Prev	Home	Next
ReverseAlias	Up	SaveSectionSize

Robots

Name

Synopsis

Description

Scope

Examples

See also