[OS X TeX] The metadata is in the log file

Joachim Kock jkock at start.no
Wed Sep 15 20:25:47 CEST 2004

Just some more remarks on log file parsing in general ---
sorry for being long, but I guess it is just because the
subject interests me...

> > On 14 Sep 2004, at 6:26 AM, Joachim Kock wrote:
> >
> >> The slogan is that the metadata is in the log file!
> >
> > An implied root file. Very elegant!
> >
> Moreover it relies on some de facto syntax of the log, 
> which is not documented at all and not officially supported.

This is true, and in fact I know that this does not work with
certain Windows implementations, where the sourced files are
indicated with another syntax than in unix tex.

However, this de facto syntax might actually be a stabler
standard than any invented standard for external metadata:
we may come up with some canonical XML standard today; in five 
years something else might be a la mode... :-)

> If we ever happen to have a localized version of TeX, the 
> log file will be localized and the log file should be 
> parsed accordingly.

Curiously, when scanning the log file for input files, no
natural language is involved --- it is only a question of
looking for patterns that look like file names, and in each
case ask the system if a readable file of that name exists.

> Finally, if the format file is patched and decides to 
> \wlog something else this method is lost.

Well, that should be the stroke of a implausible mishap:
the package or patched format would have to write the name
of the particular file we are querying for, and whatever
be the purpose of the log message issued from this
hypotetical package, it is sort of unlikely that it will
write the name of a file except if it actually did something
with this file, in which case it makes sense to declare that
then the file was in fact included, in this more general
sense of this unknown package.

In summary I have found the mechanism to be remarkably
reliable.  But as I also wrote in the beginning of the
first mail: this is in no way meant as a true metadata
candidate, it is just a little gadget that in some case
saves you from a mistake, and in most cases makes no
difference at all.

It is also possible to read off the format and the tex engine
from the log file, cf. "This is e-TeXk... ... format=latex..."
but in this case, as you point out there is a lot of dependency
on the format of the message given by the programme.  While
this happens to work well for etex and pdftex and some other
standard engines I have no idea of how this looks for more
exotic engines like ConTeXt, omega, or XeTeX.  (By the way,
if you are reading this and you happen to use any of these
three formats, I would be grateful if you would send me a 
typical log file!)

An even subtler task is to extract error messages with filenames
and line numbers.  (Unless compiling with the option
--file-line-error-style...)  This requires quite an arsenal of
heuristics, and you are forced to devise a regexp for each possible
error message.  Here the problem is often that you get false
positive, and often the cuprit is some package that writes
messages you did not take into account when you prepared the regexp, 
--- this is another problem you also pointed out.  Furthermore,
some error messages come without error line information, in
which case you only have a short one-line excerpt from the error
message which you can use to search the source file for the
occurrence of the error.  All this seems to be an awful lot of
work to do for each tex run, but scanning a few text files is
accomplished in a split second, even on computers from the previous


Joachim Kock <kock at math.uqam.ca>
Département de mathématiques -- Université du Québec à Montréal
Case postale 8888, succursale centre-ville
Montréal (Québec), H3C 3P8 -- Canada

Få din egen @start.no-adresse gratis på http://www.start.no/
--------------------- Info ---------------------
Mac-TeX Website: http://www.esm.psu.edu/mac-tex/
           & FAQ: http://latex.yauh.de/faq/
TeX FAQ: http://www.tex.ac.uk/faq
List Post: <mailto:MacOSX-TeX at email.esm.psu.edu>

More information about the macostex-archives mailing list