# [OS X TeX] Building new formats (MacTeX)

Maarten Sneep maarten.sneep at xs4all.nl
Sat Sep 16 12:22:01 CEST 2006

Hi Rowland,

the wrong question, please be as exact as you possibly can in asking

So what do I think your question is? How do I customise my TeX
install to include the UK hyphenation patterns? Since I normally use
i-Installer to install and customise my tex, I had to do quite a bit
of digging to answer this without the short but for some reason
unacceptable (to you anyway) "Just run the i-Installer configure stage".

The answer it seems comes in two stages: fmtutil and its
configuration, and the language configuration files for the various
formats. Yes, careful digging reveals that there is more than one,
and they are not all called "language.dat"

So, one step at a time: fmtutil will take the appropriate actions for
you to build a format. I would advise against doing it manually, even
though pdfetex -ini and some other parameter calls is certainly
possible. TeTeX however includes some management utilities to make
life easier, just use them.

OK fmtutil. This tool is a shell script, so in principle, one can
figure out what it does. There is an --edit option to the tool, that
you can use to edit the file, however, there are some permissions
checks on that, and for me this doesn't work. Luckily, the default
name for the configuration file can be found in the script:
"fmtutil.cnf", and kpsewhich can be used to find which one is used: /
usr/local/teTeX/share/texmf/web2c/fmtutil.cnf.

There are some notes at the top of the file, and I copied an example:

# The format of the table is:
# format	engine		pattern-file	arguments
# The last part of "arguments" must be the name of the file to run
# initex (or another "ini"-engine) on.

pdflatex	pdfetex		language.dat	-translate-file=cp227.tcx *pdflatex.ini

so the pdflatex format uses the pdfetex engine, uses langage.dat for
the language configuration, and needs some codepage translation. The
macros themselves are loaded from pdflatex.ini. The fmtutil loops
over all formats that are not commented out, and uses these
parameters to create the format in the right location.

So, for formats where there is a format and a language file listed
here, it is as easy as finding the pattern descriptions file (mostly
language.dat, sometimes it is something else), and ask which one to
use with:

kpsewhich -progname="engine" pattern-file

where engine and pattern-file refer to the table columns given above.

I seem to recall you wanted babel in plain tex as well. On Mac OS X I
assume this to mean that you want babel support in pdftex. This is a
bit more tricky, as the following fragment shows. Well, at least the
instructions are there…

# Change "tex.ini -> bplain.ini" and "- -> language.dat"
# if you want babel support in tex. Add -translate-file=cp227.tcx
before tex.ini
# if you want to make all characters directly "printable" for
# any \write (instead of ^^xy).

So to add babel to tex and pdftex, change the tex, resp. pdftex
format lines to:

tex	tex	language.dat	-translate-file=cp227.tcx bplain.ini
pdftex	pdfetex	language.dat	-translate-file=cp227.tcx bplain.ini

I hope that takes care of the format creation.

Now on to he language definition files. This can be slightly
different, depending on the exact flavour of the tex format itself. I
won't go into Context here, partly because I don't know the finer
details, partly because I didn't see you mention it.

The formats you seem to be using are plain tex and latex. They both
use language.dat (at least after the changes listed above), which
shortens the discussion. You'll want to have the language.dat file in
the texmf.local tree. Copy the language.dat file given by kpsewhich
to /usr/local/teTeX/share/texmf.local/tex/generic/config/language.dat
and open it in your text editor. If you want to deal with other
formats, please copy the appropriate language files into the local
tree before editing as well. First of all it gives you a backup, and
secondly it will prevent i-Installer to trample all over your
changes, should you decide later on to use it anyway.

For UK English, we find in that file: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% UK english, TWO LINES! To enable these lines, remove %! and the space.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%
%! british	ukhyphen.tex % unavailable in teTeX due to license problem!
%! =UKenglish

The license problem has been commented upon before. This defines the
(babel) name british and loads the patterns given in ukhyphen.tex
under that label. The second line defines the name UKenglish to be
the same as british. All you need to do here is follow the
instructions: remove '%! ' at the start of the two lines.

There is another bit of instruction in the file: %%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% CAUTION: the first language will be the default if no style-file
%          (e.g. german.sty) is used.
% Since version 3.0 of TeX, hyphenation patterns for multiple
languages are
% possible. Unless you know what you are doing, please let the american
% english patterns be the first ones. The babel system allows you to
% easily change the active language for your texts. For more
information,
% have a look to the documentation in texmf/doc/generic/babel.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%
activate the UK english hyphenation patterns.

Enable all other languages you care to use, and save the file.

Now run sudo -H -u root fmtutil-sys --all to re-create the format,
with the configuration you just specified. The sudo is needed for the
permissions, the -H -u root options for sudo set the username and
home directory to the actual root user so that files in your ~/
itself.

I hope this clears thing up for you. As to the other implied
question: how do you figure this out? Some experience with the
cryptic output of --help, -? and -h of the various tools helps. A
constant reminder that man pages list only what you need to know, and
not a comma extra, and some experience with maintaining a Linux
system at work.

There are two files in most text distributions (well, unix-
distributions anyway) that control everything: texmf.cnf and
fmtutil.cnf. Some starting text can be found in /usr/local/teTeX/
details in the comments in both texmf.cnf and fmtutil.cnf Oh, and
some experience with programming helps: this system was developed by
programmers, and as you have figured out by now: user friendliness
wasn't at the top of their list of priorities.

Regards,

Maarten

PS, I've added some minor remarks without explanation below. Right
near the end there is a question on how MacTeX deals with a pre-
existing texmf.local tree when updating: is it overwritten or not?

On 15-sep-2006, at 22:32, Rowland McDonnell wrote:

> According to a note in the language.dat file, ukhyphen.tex is
> unavailable in teTeX due to licensing problems.  I'd guess the same
> applies to other distributions - are you sure UKEnglish is included
> with
> TeXLive?

Yes, in gwtex. Thomas Esser and Gerben Wierda have slightly different
opinions on what can or can't be included. Otherwise you probably can
obtain it from CTAN: http://www.ctan.org/search.html

>>> I was thinking about modifying the Babel setup so that I could have
>>> the existing languages plus the one I need, with the one I need set
>>> up as the default.
>>>
>>> I see that to do this, I need to edit the appropriate language.dat
>>> file.
>>
>> a simple find command in the terminal gives that these are all
>> language .dat files in the texmf trees.
>
> Okay - I've printed out the find' man page and learnt how to do this
> (details on how to do so at the end of this email).  But how can I
> tell
> which file is used for which format?

The one returned by kpsewhich, since that is the same routine used by
tex itself.

> At some point, iniTeX will run, and I need to make sure it'll read the
> appropriate file when it runs - for every format I'm rebuilding.
> Finding language.dat files isn't the problem: the problem is making
> sure
> that the one I want to be read, is the one that's read.

Sorry, I was barking up the wrong tree, I hope the discussion of
fmtutil above did in fact help.

>> /usr/local/teTeX/share/texmf.local/tex/generic/config/language.dat
>> /usr/local/teTeX/share/texmf.gwtex/tex/generic/config/language.dat
>> /usr/local/teTeX/share/texmf.tetex/tex/generic/config/language.dat
>> /usr/local/teTeX/share/texmf.tetex/tex/lambda/config/language.dat
>> /usr/local/teTeX/share/texmf.tetex/tex/platex/config/language.dat
>>
>> The first one shadows the next two for all tex formats, except lambda
>> and platex.
>
> I don't understand what you mean by this.  Could you tell me where
> it's
> explained in the documentation (if anywhere)?

The howtexfindsfiles mentioned above, plus comments in texmf.cnf. Or
the output of kpsewhich, which is the routine actually used by tex &
friends.

>> So it seems that in practice, you can just limit yourself
>> to the first one.
>
> What I actually get is this:
>
> Hattie:teTeX rowland\$ find . -name "language.dat"
> ./share/texmf.gwtex/tex/generic/config/language.dat
> ./share/texmf.tetex/tex/generic/config/language.dat
> ./share/texmf.tetex/tex/lambda/config/language.dat
> ./share/texmf.tetex/tex/platex/config/language.dat
>
> [snip]
>
> [Gerben Wierda says elsewhere:}
>> The first one probably does not exists on a pristine install of
>> MacTeX. The second one does, but it should be copied to the location
>> of the first one before being edited or it will be overwritten on a
>> next install.
> [end Gerben Wierda]
>
> What I'd like to do is have a single local language.dat file that is
> always read when I run iniTeX to rebuild all formats.

See discussion of fmtutil.cnf

> (and some sort of babel config file - damned if I can find any info on
> how to set up Babel to give me a default language other than US
> English)

Babel is not part of the ...TeX format, so there is no default there.
Load the package with \usepackage[english]{babel} and in latex you'll
get the right patterns. For pdftex (plain), you'll need to \input
some file, and \def some things, but I'm at a loss here. However, at
least I now know how to includekpsewhich the languages in the first
place.

> How can I find out how to do this?
>
>>> I've no idea where to start looking - can anyone help me find out
>>> how to invoke kpsewhich for this job?
>>
>> kpsewhich -help gives basic (somewhat cryptic) help. from that:
>
> Umm.  The help is indeed cryptic.  I can't get anywhere with it.
>
>>
>> -programme=NAME : set the program name to NAME (latex, pdftex,
>> lambda,
> …)
>> -format=NAME : the file format to search for (or to limit the result
>> to).
>
> And I'm afraid this is just as cryptic as the man page.
>
>> The choice of the name 'format' in this context is a bit unfortunate:
>> you want the
>> -programme argument, not the -format argument.
>
> If I knew why you were explaining this, I might be able to follow you
> better.
>
> Could you explain why you are telling me that there are -
> programme' and
> -format' options?
>
> I'm completely baffled by this.  Ah!  No, having read my original
> email
> on the subject, I am now less baffled.   Perhaps you could [snip] less
> of the original text?

snipping too little makes for rather tedious reading, and makes it a
lot harder to follow the line of my explanation.

>
>> kpsewhich language.dat -programme=pdflatex
>> gives the language settings file for pdflatex.
>
> Okay.  But what does that mean, exactly?  Why do I need to know this?
> How could I have worked it out for myself?
>
> (Thank you for telling me, but I'd like to be able to find this
> sort of
> thing out without having to ask - frankly, it'd be nice to have some
> time, and
> I'm not sure that the answer you've given me is actually any help
> to me
> at all, since I don't see how it applies to init..)
>
> The question I still need an answer to is: how can I find out which
> particular language.dat files will be used when I build formats?
>
> - I take it that there are multiple methods of control, depending on
> format building methods.
>
>> Since most formats use the same file, I would advise to use
>> sudo -H fmtutil-sys --all
>> to recompile all formats at once.
>
> How do I find out what that command does?  That is, when you say All
> formats', how do I find out which formats will be rebuilt?  And
> according to which scheme?
>
>> This will probably prevent some
>> surprises later on.
>
> Okay.
>
> I need to find out how to replace all the format files used for Plain
> TeXing and LaTeXing with the machete distribution, and do so using my
> local language.dat (and perhaps other config) files.
>
>> The format of the language.dat file is documented in the file itself
>
> There is no explanation of the format of the file that I can see - not
> an inadequate explanation: absolutely no explanation of any sort.

No, not for the format. Because I hardly ever do this by hand, I'm
stumbling as I go along. That is the reason this message is somewhat
decided to dive deeper into fmtutil.

>> (the languages are all there,
>> but with most commented out, so there is no need to figure out what a
>> language should be called).
>
> It's not the modifying of the file that's the problem, but working out
> where to put my copy of the modified file so that it's 1) used and 2)
> not overwritten; then I need to work out how to re-build all the
> formats.

1) use kpsewhich with the engine and file-name for see if the file
you had in mind is indeed found.
2) copy that file to reside within the texmf.local tree, and try (1)
again to see if it is found instead.
Now, depending on how exactly you are going to update your tex: i-
Installer will leave texmf.local alone. I don't know about MacTeX,
perhaps someone else can comment on this. You may want to copy your
fmtutil.cnf into the texmf.local tree as well. With the commands I
gave it is not possible to store the configuration the your home
directory.

> You say that using the fmutil command is the right thing to do - is
> there anything to explain what I can expect to see happen if I use
> it as
> directed on a default MacTeX installation?  The fmtutil man page is an
> unusually terse piece of work, and I can find no other
> documentation for
> fmtutil.

See above, with some explanation on how I figured this out myself

------------------------- Info --------------------------
Mac-TeX Website: http://www.esm.psu.edu/mac-tex/
& FAQ: http://latex.yauh.de/faq/
TeX FAQ: http://www.tex.ac.uk/faq
List Archive: http://tug.org/pipermail/macostex-archives/