Searching documents

mnoGoSearch 3.4.1 reference manual: Full-featured search engine software
Prev		Next

Performing search

Open your preferred front-end in Web browser:

http://your.web.server/path/to/search.cgi

or

http://your.web.server/path/to/search.php

or

http://your.web.server/path/to/search.pl

To start search, type the words you want to find and press the SUBMIT button. For example, ``MySQL ODBC''. mnoGoSearch will find documents having the words MySQL and/or ODBC. The best matching documents will be displayed in the top of the search results.

Note: The quote signs `` and '' are not parts of the search query. They are used in this example and in the other examples given in the manual to separate search queries from the other text.

Note: mnoGoSearch works case insensitively. The case of the letters in a search query does not matter.

Search parameters

mnoGoSearch front-ends support the following CGI query string parameters (which can be used in the HTML search form variables).

Note: Search parameters can also be set using the ReplaceVar command.

Table 11-1. Available search parameters

q	text parameter with the search query words
s	a character sequence specifying the result sorting order. Small letters mean ascending order, capital letters mean descending order. The following letters are understood: `R` or `r` - for sorting by score, `P` or `p` - for sorting by Popularity Rank, `D` or `d` - for sorting by modification date. `U` or `u` - for sorting by URL. `S` or `s` - for sorting by a user defined section (see also the `su` parameter). The default value is `R`, which means sorting in descending score order.
su	the user defined section name to sort results when `s=S` or `s=s` is given. Note: Use the UserOrder command to improve performance of sorting by a user defined section.
sl.*	a section limit. You can limit searches using a certain value of a desired section. For example, `sl.title=Top` will only search among the documents having `title` equal to `Top`. Section values support SQL wildcards `%` and `_`. <SELECT NAME="sl.title" MULTIPLE> <OPTION VALUE="%2008%">2008</OPTION> <OPTION VALUE="%2008%">2007</OPTION> </SELECT> The above code in the HTML search form will limit searches to the documents having the substrings `2007` or `2008` in their titles, according to the user choice.
fl	Loads a fast limit with the given name pattern. The limit should be previously defined using the Limit command. If the `fl` value starts with `minus` character, the limit is considered as excluding limit. For example, `fl=-name` restricts search to the documents not covered by the limit `name`. SQL LIKE operator is used when loading fast limits during search time, so `%` and `_` wildcards can be used in the `fl` pattern. If the pattern matches multiple limits, search is restricted to the documents covered by either of them. If an excluding limit pattern matches multiple limits, search is restricted to the documents covered by non of them.
ps	page size, the number of documents displayed on one page, `10` by default.
np	the current page number, `0` by default (the first page)
offs	search result start point (offset). `0` by default (meaning display starting from the first document). `offs` is an alternative way to to set the desired offset. `np=2&ps=10` is effectively the same to `offs=20&ps=10`, and both mean display `10` documents starting from `21`. If both `offs` and `np` are specified, then `np` is ignored. Note: Using `offs` you can display results starting from an arbitrary offset, even in the "middle" of a page, for example: `offs=5&ps=10` means display `10` documents starting from `5`.
m	search mode. `all` and `any` values are supported. The default value is `all`.
wm	word match type. The available values are `wrd`, `beg`, `end` and `sub` respectively meaning whole word, word beginning, word ending and word substring match, with the whole word match type by default. Mininum word length for substring match is controlled by the SubstringMatchMinWordLength command in `search.htm`. See also the Section called Substring search notes in Chapter 7.
t	A Tag limit. Limits search through the documents with the given tag only. This parameter has a similar effect to the `-t` option in indexer command line
ul	An URL limit. Limits search results by an URL pattern. If the `ul` value represents a relative URL, then search.cgi automatically adds `%` wildcards before and after the `ul` value. For example: <OPTION VALUE="/manual/"> will add `(url LIKE '%/manual/%')` condition into the SQL query. If the `ul` value is an absolute URL with schema, then search.cgi will add `%` sign only in the end of the value. For example for: <OPTION VALUE="http://localhost/"> search.cgi will add `(url LIKE 'http://localhost/%')` condition. Note: Using an absolute URL is more efficient as it can use SQL indexes for optimization. Additionally to the automatically added wildcards, you can use your own `%` and `_` wildcards in the pattern. For example: <OPTION VALUE="http://localhost/%/archive/"> Multiple `ul` values can be given in the query string, which allows to use a `SELECT MULTIPLE` input type in the HTML search form. Multiple values are joined using the `OR` condition. For example, when a user selects both options from this list: <SELECT NAME="ul" MULTIPLE> <OPTION VALUE="/dir1/">Dir1</OPTION> <OPTION VALUE="/dir2/">Dir2</OPTION> </SELECT> search.cgi will add `(url LIKE '%/dir1/%' OR url LIKE '%/dir2/%')` condition into the search query.
ue	Limits the search results by excluding the documents matching the given URL pattern. The `ue` parameter detects absolute and relative URL patterns and automatically adds wildcards, and supports your own wildcards, similarly to the `ul` parameter. Multiple `ue` parameters are also understood to exclude multiple URL patterns at the same time. Multiple parameters are joined using the `AND` SQL operator. For example, when a user selects both options from this list: <SELECT NAME="ue" MULTIPLE> <OPTION VALUE="/dir1/">Dir1</OPTION> <OPTION VALUE="/dir2/">Dir2</OPTION> </SELECT> search.cgi will add `(url NOT LIKE '%/dir1/%' AND url NOT LIKE '%/dire2/%')` condition into the search query. Note: The `ul` and `ue` parameters can be given at the same time.
wf	A weight factor vector. It allows to change weights of the different document sections at search time. The `wf` value should be passed in the form of a hexadecimal number. Check the explanation below.
nwf	A No section weight factor vector. See the explanation below.
g	A language limit to find documents only in the given language. The value should be a two-letter language abbreviation. Have a look into the Section called Indexing multilingual servers in Chapter 9 for details. An HTML form example: <SELECT NAME="g"> <OPTION VALUE="">All language <OPTION VALUE="en">English <OPTION VALUE="de">German <OPTION VALUE="ru">Russian </SELECT>
tmplt	The search template file name (without path), to specify the template file to use instead of the default file `search.htm`.
type	A Content-Type limit to find documents with the given type, for example `application/pdf`. Multiple `type` parameters can be passed in the same query. SQL LIKE patterns are also understood.
sp	Defines whether to use stemming. `sp=1` tells search.cgi to use the Ispell commands given in `search.htm`. `sp=0` makes search.cgi ignore all Ispell commands and therefore return only the exact word forms entered by the user. The default value is `1`. See the Section called Ispell for details.
sy	Defines whether to use synonyms. `sy=1` allows using the synonym type of fuzzy search. `sy=0` makes search.cgi ignore all synonym-related commands. The default value is `1`.
tl	Defines whether to use the transliteration type of fuzzy search. `tl=yes` or `tl=1` means to use transliteration. `tl=no` or `tl=0` means to switch transliteration off. The default value is `0`.
dt	A time limit. Three time limit types are supported. `dt=back` limits the result to recent documents, modified within the period of time between `now` and back to the past up to the given period of time. The period is to be passed using the `dp` parameter. If `dt=er` is given, then search results are limited to the documents newer or older than the given date value. `dx=1` means `newer` (or `after`). `dx=-1` means `older` (or `before`). The date value is specified using the `dy`, `dm`, and `dd` parameters. If `dt=range` is given, then search returns documents modified within the given date range. The parameters `db` and `de` are used to pass the first and the last dates.
dp	A "recentness" limit. To be used in combination with `dt=back`. `dp` should be specified using the `xxxA[yyyB[zzzC]]` format. `xxx`, `yyy`, `zzz` are numbers (can be negative!). `A`, `B`, `C` are field descriptors, similar to the descriptors `strptime()` and `strftime()` C functions use, with the following meaning: `s` - second, `M` - minute, `h` - hour, `d` - day, `m` - month, `y` - year. For example: 4h30m - 4 hours and 30 minutes 1Y6M-15d - 1 year and six month minus 15 days 1h-60m+1s - 1 hour minus 60 minutes plus 1 second
dx	The `newer`/`older` flag. `dx=1` means `newer`. `dx=-1` means `older`. `dx` is to be used together with `dt=er`.
dm	Month (when `dt=er`), starting from `0`: `0` - January, `1` - February, ... , `11` - December.
dy	Year (when `dt=er`), using the four digit format. For example: `dy=2008`.
dd	Day (when `dt=er`), a number in the range `1`...`31`.
db	The beginning date (when `dt=range`), using the `dd/mm/yyyy` format.
de	The end date (when `dt=range`), using the `dd/mm/yyyy` format.
us	Specifies the name of the user defined score list which should be loaded and mixed with the score values internally calculated by mnoGoSearch, according to UserScore and UserScoreFactor configuration. If `us` is empty, or there is no a UserScore command with the given name, `us` is ignored.
ss	Specifies the name of the user defined site score list which should be loaded and mixed with the scores internally calculated by mnoGoSearch, according to UserSiteScore and UserScoreFactor configuration. If `us` is empty, or there is no a UserSiteScore command the given name, `ss` is ignored.
GroupBySite	Enables or disables grouping results by site. Can be set to `yes` or `no`, with the default value `no`. This parameter has the same effect with the GroupBySite `search.htm` command.

Changing weights of the different document parts at search time

Changing weights (importance) of the different document parts (sections) is possible with help of the wf HTML form variable passed to search.cgi.

To be able to use this feature, it is recommended to set different section IDs for different document parts in the Section command in indexer.conf. Currently up to 256 separate sections are supported.

Imagine that we have these default sections in indexer.conf:

Section body        1  256
Section title       2  128
Section keywords    3  128
Section description 4  128

The wf value is a string of hexadecimal digits ABCD, where every digit represents a weight factor for the corresponding section. The rightmost digit corresponds to the section with ID=1. If a weight factor for some section is 0, then this section is totally ignored at search time.

For the given above section configuration:

      D is a factor for section 1 (body)
      C is a factor for section 2 (title)
      B is a factor for section 3 (keywords)
      A is a factor for section 4 (description)

Examples:

    wf=0001 will search through the section body only.

    wf=1110 will search through the sections
    title,  keywords, description.
    The section body will be ignored.

    wf=F421 will search through:
           Description with factor 15 (F hex)
           Keywords with factor 4
           Title with factor 2
           Body with factor 1

It is also possible to set the default wf value using the wf search.htm command. If wf is omitted in the query and there is no a wf command defined in search.htm, all section factors are considered to be equal to 1, which means that all sections have the same weight.

Starting from the version 3.3.0, it is also possible to specify the wf value as a DBAddr search.htm command parameter. This can be useful if you're using multiple DBAddr commands to merge search results from multiple databases and want to give higher or lower score to the results coming from a certain database.

The nwf search parameter uses the same format with wf. If all found words appear only in a single section, then resulting score becomes lower. It can be used for example to ignore spam in the KEYWORDS meta tag. If you use high wf and nwf values for the section corresponding to the KEYWORDS meta tag, then score will be high only if KEYWORDS match the rest of the document, that is if the query words appear in KEYWORDS and at the same time in other sections (like title or body). If the query words are found in the section KEYWORDS alone, then score for this documents will be low. Starting from the version 3.3.3, nwf can also be set as a parameter to the DBAddr command in search.htm.

Changing importance of individual query words

mnoGoSearch search query language allows to specify different importance for individual search query words. The range of possible user-defined importance values is 0-256. The the default value is 256 for all query words. You can change importance of some words using a special keyword importance immediately followed by a number and a semicolon character:

star wars importance10:movie

In the above example, importance for the words star and wars is 256 (the default values), while importance for the word movie is 10, which makes it less important when ranking found documents.

If you specify importance0: for some query word, for example:

star wars importance0:movie

then this word will be ignored only at ranking time, however this word will still be required if you're doing an m=all search query (i.e. "find all words"). Therefore, in the above example, search will not return documents which don't have the word movie.

Using multiple templates

It is often required to use multiple templates with the same search.cgi. There are a few ways to do it. They are given here in the order search.cgi detects the template name.

search.cgi checks the environment variable UDMSEARCH_TEMPLATE. So you can put a path to the desired search template to UDMSEARCH_TEMPLATE.
search.cgi also supports Apache internal redirect. It checks the REDIRECT_STATUS and REDIRECT_URL environment variables. To start using Apache internal redirect you can add these lines into httpd.conf:
```
AddType text/html .zhtml
AddHandler zhtml .zhtml
Action zhtml /cgi-bin/search.cgi
```
Put search.cgi into your /cgi-bin/ directory. Then put the HTML search templates into your Web server directory using the .zthml extension, for example template.zhtml. Now you can open the search page by typing this URL in the browser location bar:
```
http://www.site.com/path/to/template.zhtml
```
Instead of .zthml you can configure any other extension on your choice.
search.cgi also checks the URL part after the "search.cgi" substring, which is available in the PATH_INFO environment variable. For example, if you type http://site/search.cgi/search1.html in your browser, search.cgi will open search1.htm as a template file. If you type http://site/search.cgi/search2.html, it will use search2.htm, and so on.
If the above three ways did not work, search.cgi opens a template which has the same name with the script being executed by reading the SCRIPT_NAME environment variable value. search.cgi opens the template file ETC/search.htm, search1.cgi opens the template file ETC/search1.htm and so on, where ETC is mnoGoSearch /etc directory (usually /usr/local/mnogosearch/etc). So, you can create a number of symbolic or hard links to the same search.cgi and open it using its different names.
Note: See also the Section called Search pages with multi-lingual interface in Chapter 9.

Advanced Boolean search

You can compose complex search queries with help of the Boolean query language.

mnoGoSearch understands the following Boolean operators:

& - logical AND. For example, ``mysql & odbc''. mnoGoSearch will return the documents containing both words mysql and odbc. You can also use + for this operator.

| - logical OR. For example, ``mysql|odbc''. mnoGoSearch will find the documents containing the word mysql, or containing the word odbc.

~ - logical NOT. For example, ``mysql & ~odbc''. mnoGoSearch will find the documents containing the word mysql and not containing the word odbc at the same time. Note that the ~ operator can only exclude the given word from the results. The query ``~mysql & ~odbc'' will return no result.

() - the grouping command to compose more complex queries. For example, ``(mysql | msql) & ~postgres''.

Note: Boolean operators work only in queries having two or more words. search.cgi ignores Boolean operators in queries consisting of a single word. Thus, the query ``~odbc'' will just search for the word odbc without treating the ~ sign as the NOT operator.

Note: Boolean search considers stopwords as found in any documents that contain the other search terms from the same query. For example, if ``the'' is a stopword, the query ``(Jana First)|(Michael Second)|the'' will return all documents that have any of the four non-stopword terms and is effectively the same to ``Jana|First|Michael|Second''.

Note: If a search query consists of more than 64 words, Boolean search results are not predictable.

Restricting search words to a section

Starting from the version 3.2.39, mnoGoSearch understands section name references. For example, ``title:web body:server'' will find the documents having the word web in their titles and at the same time the word server in their bodies. To make search.cgi recognize section names, you need to copy the desired Section commands from indexer.conf to search.htm.

Note: Section name references can be combined with Boolean operators.

Phrase search

Phrase search is activated by using quote characters around the words. For example, the query ``"search engine"'' will return the documents having the word search immediately followed by the word engine, while the query ``search engine'' (i.e. without the surrounding quotes) will not require the words to be close to each other.

Note: It is possible to combine two or more phrases in the same query, as well as combine phrases with Boolean operators.

Starting from the version 3.2.39, automatic phrase search is forced for complex words having dots, dashes, underscores, commas and slashes (- _ . , /) as delimiters between the word parts. For example, the query ``max_allowed_packet'' automatically searches for the phrase ``"max allowed packet"'', not just for the three separate words.

Exact section match

Starting from the version 3.3.0, exact section match syntax is available. An exact section match query consists of a section reference (as described in the Section called Restricting search words to a section ), followed by the = (the EQUAL sign), followed by a phrase in quotes. For example, the search query ``title="search engine"'' will return the documents having title equal to the phrase "search engine".

Exact section match is not available if you set SaveSectionSize set to no.

How search handles expired documents

Expired documents are still searchable with their old content.

Prev	Home	Next
mnoGoSearch templates		Designing `search.htm`

Chapter 11. Searching documents

Using search front-ends