Go to the first, previous, next, last section, table of contents.


Path searching

This chapter describes the generic path searching mechanism Kpathsea provides. For information about searching for particular file types (e.g., TeX fonts), see the next chapter.

Searching overview

A search path is a colon-separated list of path elements, which are directory names with some extra frills. A search path can come from (a combination of) many sources; see below. To look up a file `foo' along a path `.:/dir', Kpathsea checks each element of the path in turn: first `./foo', then `/dir/foo', (typically) returning the first one that exists.

The "colon" and "slash" mentioned here aren't necessarily `:' and `/' on non-Unix systems. Kpathsea tries to adapt to other operating systems' conventions.

To check a path element e, Kpathsea first sees if a prebuilt database (see below) applies to e, i.e., if the database is in a directory that is a prefix of e. If so, the path specification is matched against the contents of the database.

If the database does not exist, or does not apply to this path element, contains no matches, the filesystem is searched. Kpathsea constructs the list of directories that correspond to this path element, and then checks in them for the file being searched for. (To help speed future lookups of files in the same directory, the directory in which a file is found is floated to the top of the directory list.)

Each path element is checked in turn: first the database, then the disk. Once a match is found, the searching stops and the result is returned. This avoids possibly-expensive processing of path specifications that are never needed on a particular run.

Although the simplest and most common path element is a directory name, Kpathsea supports additional features in search paths: layers of default values, environment variable names, config file values, users' home directories, and recursive subdirectory searching. Thus, we say that Kpathsea expands a path element, meaning getting rid of all the magic specifications and getting down to the basic directory name or names. This process is described in the sections below. It happens in the same order as the sections.

Exception to the above: If the filename being searched for is absolute or explicitly relative, i.e., starts with `/' or `./' or `../', Kpathsea simply checks if that file exists; it is not looked for along any paths.

Path sources

A search path can come from many sources. In priority order (meaning Kpathsea will use whichever it finds first):

  1. A user-set environment variable, e.g., `TEXINPUTS'.
  2. A program-specific configuration file, e.g., an `S /a:/b' line in Dvips' `config.ps'.
  3. A line in a Kpathsea configuration file `texmf.cnf', e.g., `TEXINPUTS=/c:/d'. See section below.
  4. The compile-time default (specified in `kpathsea/paths.h').

In any case, once the path specification to use is determined, its evaluation is independent of its source. These sources may also be combined via default expansion. See the next section.

You can see each of these values for a given search path by using the debugging options of Kpathsea or your program. See section Debugging.

Config files

As mentioned above, Kpathsea reads runtime configuration files named `texmf.cnf' for search path definitions. The path used to search for them is constructed in the usual way, as described above (except that configuration files cannot be used to define the path, naturally; also, an `ls-R' database is not used to search for them, for technical reasons).

The environment variable used is `TEXMFCNF'.

Kpathsea reads all `texmf.cnf' files in the search path, not just the first one found; it uses the first definition of each variable encountered. Thus, with the (default) search path of `.:$TEXMF', values from `./texmf.cnf' override those from `$TEXMF/texmf.cnf'.

Here is the format for `texmf.cnf' files:

Here is the fragment from the distributed file illustrating most of these points:

% TeX input files -- i.e., anything to be found by \input or \openin [...]
latex209_inputs = .:$TEXMF/tex/latex209//:$TEXMF/tex//
latex2e_inputs = .:$TEXMF/tex/latex2e//:$TEXMF/tex//
TEXINPUTS = .:$TEXMF/tex//
TEXINPUTS.latex209 = $latex209_inputs
TEXINPUTS.latex2e = $latex2e_inputs
TEXINPUTS.latex = $latex2e_inputs

Although this format has obvious similarities to Bourne shell scripts--change the comment character to #, disallow spaces around the =, and get rid of the .program convention, and it could be run through the shell. But there seemed little advantage to doing this, since all the information would have to passed back (with echo's, presumably) to Kpathsea and parsed there anyway, since the sh process couldn't affect its parent's environment.

The implementation of all this is in `kpathsea/cnf.c'.

Default expansion

If the highest-priority search path (in the list in the previous section) contains an extra colon (i.e., leading, trailing, or doubled), Kpathsea inserts the next-highest-priority search path that is set at that point. If that search path has an extra colon, the same happens with the next-highest. (An extra colon in the compile-time default value has unpredictable results, and may cause the program to crash, so installers beware.)

For example, given

setenv TEXINPUTS /home/karl:

and a `TEXINPUTS' value from `texmf.cnf' of

.:$TEXMF//tex

then the final value used for searching will be:

/home/karl:.:$TEXMF//tex

You can trace this by debugging "paths" (see section Debugging).

Minor technical point: Since it would be useless to insert the default value in more than one place, Kpathsea changes only one extra `:' and leaves any others in place (where they will eventually be effectively equivalent to `.', i.e., the current directory). It checks first for a leading `:', then a trailing `:', then a doubled `:'.

Variable expansion

`$foo' or `${foo}' in a path element is replaced by (1) the value of an environment variable `foo' (if it is set); (2) the value of `foo' from `texmf.cnf' (if any such exists); (3) the empty string.

If the character after the `$' is alphanumeric or `_', the variable name consists of all consecutive such characters. If the character after the `$' is a `{', the variable name consists of everything up to the next `}' (braces are not balanced!). Otherwise, Kpathsea gives a warning and ignores the `$' and its following character.

Remember to quote the `$''s and braces as necessary for your shell.

Shell variable values cannot be seen by Kpathsea.

For example, given

setenv TEXMF /home/tex
setenv TEXINPUTS .:$TEXMF:${TEXMF}new

the final `TEXINPUTS' path is the three directories:

.:/home/tex:/home/texnew

You can trace this by debugging "paths" (see section Debugging).

Tilde expansion

A leading `~' or `~user' in a path element is replaced by the current or user's home directory, respectively.

If user is invalid, or the home directory cannot be determined, Kpathsea uses `.' instead.

For example,

setenv TEXINPUTS ~/mymacros:

will prepend a directory `mymacros' in your home directory to the default path.

Subdirectory expansion

A `//' in a path element following a directory d is replaced by all subdirectories of d: first those subdirectories directly under d, then the subsubdirectories under those, and so on. At each level, the order in which the directories are searched is unspecified. (It's "directory order", and definitely not alphabetical.)

If you specify any filename components after the `//', only subdirectories which contain those components are included. For example, `/a//b' would expand into directories `/a/1/b', `/a/2/b', `/a/1/1/b', and so on, but not `/a/b/c' or `/a/1'.

I should mention one related implementation trick, which I stole from GNU find. Matthew Farwell `<dylan@ibmpcug.co.uk>' suggested it, and David MacKenzie `<djm@gnu.ai.mit.edu>' implemented it (as far as I know).

The trick is that in every real Unix implementation (as opposed to the POSIX specification), a directory which contains no subdirectories will have exactly two links (namely, one for `.' and one for `..'). That is to say, the st_nlink field in the `stat' structure will be two. Thus, we don't have to stat everything in the bottom-level (leaf) directories--we can just check st_nlink, notice it's two, and do no more work.

But if you have a directory that contains one subdirectory and five hundred files, st_nlink will be 3, and Kpathsea has to stat every one of those 501 entries. Therein lies slowness.

You can disable the trick by undefining UNIX_ST_LINK in `kpathsea/config.h'. (It is undefined by default except under Unix.)

Unfortunately, in some cases files in leaf directories are stat'd: if the path specification is, say, `$TEXMF/fonts//pk//', then files in a subdirectory `.../pk', even if it is a leaf, are checked. The reason cannot be explained without reference to the implementation, so read `kpathsea/elt-dirs.c' (search for `may descend') if you are curious. (And if you can find a way to solve the problem, please let me know.)

Filename database (ls-R)

Kpathsea goes to some lengths to minimize disk accesses for searches (see section Subdirectory expansion). Nevertheless, at installations with enough directories, doing a linear search of each possible directory for a given file can take an excessively long time ("excessive" depending on the speed of the disk, whether it's NFS-mounted, how patient you are, etc.). In practice, the union of font directories from the Dvips(k) and Dviljk distributions is large enough for searching to be noticeably slow on typical machines these days.

Therefore, Kpathsea can use an externally-built "database" that maps files to directories, thus avoiding the need to exhaustively search the disk. By fiat, you must name the file `ls-R', and put it at the root of the TeX installation hierarchy (`$TEXMF' by default). Kpathsea does variable expansion on the `$TEXMF', naturally, so you can use different `ls-R''s for different trees, if you are testing new ones. However, one and only one `ls-R' is read; it is not searched for along any paths.

You can build `ls-R' with the command

ls -R /your/root/dir >ls-R

if your ls produces the right output format (see the section below). GNU ls, for example, outputs in this format. It is probably best to do this via cron, so changes in the installed files will be automatically reflected (albeit with some delay) in the database.

If your system uses symbolic links, the command ls -LR will be more reliable than plain ls -R. The former follows the symbolic links to the real files, which is what Kpathsea needs.

Kpathsea warns you if it finds an `ls-R' file, but the file does not contain any usable entries. The usual culprit is using just ls -R to generate the `ls-R' file instead of ls -R /your/dir. Kpathsea looks for lines starting with `/', to improve reliability with unusual filenames (specifically, those ending with a `:').

Because the database may be out-of-date for a particular run (e.g., if a font was just built with MakeTeXPK), if a file is not found in the database, by default Kpathsea goes ahead and searches the disk. If a particular path element begins with `!!', however, only the database will be searched for that element, never the disk. If the database does not exist, nothing will be searched. Because this can greatly surprise users ("I see the font `foo.tfm' when I do an ls; why can't Dvips find it?"), I do not recommend using this feature.

Database format

The "database" read by Kpathsea is a line-oriented file of plain text. The format is that generated by GNU (and perhaps other) ls programs given the `-R' option, as follows.

For example, here's the first few lines of `ls-R' on my system:

bibtex
dvips
fonts
ini
ls-R
mf
tex

/usr/local/lib/texmf/bibtex:
bib
bst
doc

/usr/local/lib/texmf/bibtex/bib:
asi.bib
bibshare
btxdoc.bib

On my system, `ls-R' is about 30K bytes.


Go to the first, previous, next, last section, table of contents.