If you have a set of files that regularly change over time: Old files
are deleted, new ones are added, or existing files are modified, you
can benefit from using the file ID
indexing methodology.
Examples of this type of database might include an index of WWW
resources, or a USENET news spool area.
Briefly speaking, the file key methodology uses the directory paths
of the individual records as a unique identifier for each record.
To perform indexing of a directory with file keys, again, you specify
the top-level directory after the update
command.
The command will recursively traverse the directories and compare
each one with whatever have been indexed before in that same directory.
If a file is new (not in the previous version of the directory) it
is inserted into the registers; if a file was already indexed and
it has been modified since the last update, the index is also
modified; if a file has been removed since the last
visit, it is deleted from the index.
The resulting system is easy to administrate. To delete a record you
simply have to delete the corresponding file (say, with the
rm
command). And to add records you create new
files (or directories with files). For your changes to take effect
in the register you must run zebraidx update
with
the same directory root again. This mode of operation requires more
disk space than simpler indexing methods, but it makes it easier for
you to keep the index in sync with a frequently changing set of data.
If you combine this system with the safe update
facility (see below), you never have to take your server off-line for
maintenance or register updating purposes.
To enable indexing with pathname IDs, you must specify
file
as the value of recordId
in the configuration file. In addition, you should set
storeKeys
to 1
, since the Zebra
indexer must save additional information about the contents of each record
in order to modify the indexes correctly at a later time.
For example, to update records of group esdd
located below
/data1/records/
you should type:
$ zebraidx -g esdd update /data1/records
The corresponding configuration file includes:
esdd.recordId: file esdd.recordType: grs.sgml esdd.storeKeys: 1
You cannot start out with a group of records with simple indexing (no record IDs as in the previous section) and then later enable file record Ids. Zebra must know from the first time that you index the group that the files should be indexed with file record IDs.
You cannot explicitly delete records when using this method (using the
delete
command to zebraidx
. Instead
you have to delete the files from the file system (or move them to a
different location)
and then run zebraidx
with the
update
command.