Database schema

The complete database schema used by mnoGoSearch can be found in the SQL scripts in the /create subdirectory of mnoGoSearch sources.

Table 13-2. server table schema

Column namePurpose
rec_idAn unique record identifier.
enabledIndicates whether this record is to be loaded or ignored by indexer. Use this flag to disable entries temporarily.
urlURL in case of Server, or an URL pattern in case of a filter (Allow, Disallow, etc.).
tagTag value, used to limit searches by Tag.
command

'S' - this record defines a server.

'F' - this record defines a filter.

ordreSorting key, indexer fetches the records from the table "server" in the order of this key. Put the entries for subdirectories with smaller ordre values than the entries for the entire server.
parent Use 0 for your own entries. A non-zero value N in this column indicates that this record was collected by indexer automatically, references to the parent record with url_id=N.
weightThe weight of this record for PopRank calculation.
pop_weightThe weight of the outgoing links of this server. This value is calculated automatically. Manual changes will have no effect.

The other server parameters are stored in the srvinfo table. Possible values for some parameters are given in the table below. Most of them have similar effect with the corresponding indexer.conf commands.

Table 13-3. Server parameters in the table srvinfo.

sname valuePossible sval values.
AliasAn alias used for the URL, in case of a server definition.
PeriodReindexing period in seconds.
RemoteCharset Default character set value.
DefaultLangDefault language value.
DetectClonesIndicates whether to detect clones for this site.
Request.AuthorizationFor basic authorization.
Request.ProxyProxy server to access documents from this resource.
Request.Proxy-AuthorizationProxy server authorization.
MaxHopsMaximum depth of the path in "mouse" clicks from the start URL.
Index

yes indicates that the content of this site should be indexed.

no indicates that the site content should not be indexed, but the outgoing links should be collected.

Follow Corresponds to the Subsection argument of the command Server.

0 - "page"

1 - "path"

2 - "site"

3 - "world"

RobotsIndicates whether robots.txt should be downloaded and processed for this site.
MaxNetErrorsMaximum network errors for this server.
NetErrorDelayTimeCrawler delay time when a network error occurs for this server.
ReadTimeOutNetwork timeout value.
match_type

=0, UDM_MATCH_FULL - full match (Server page).

=1, UDM_MATCH_BEGIN - pattern is a URL prefix (Server path).

=2, UDM_MATCH_SUBSTR - pattern is a URL substring.

=3, UDM_MATCH_END - pattern is a URL suffix.

=4, UDM_MATCH_REGEX - pattern is a regular expression (Realm regex).

=5, UDM_MATCH_WILD - a wildcard pattern with * and ? wildcards (Realm string).

=6, UDM_MATCH_SUBNET - < not yet supported >.

case_sense

1 - case insensitive match.

0 - case sensitive match.

nomatch

1 - URLs not matching this record are accepted.

0 - URLs matching this record are accepted.