About boundary characters

Doug McKenna doug at mathemaesthetics.com
Thu Sep 19 02:21:48 CEST 2019

Karl, Didier -

My code faithfully duplicates DEK's algorithm, which his famous comment about "premature optimization" does not apply to, because his code for appending characters to the layout was turned into "post-matured" spaghetti.  I never tried to rewind its clock, so my C code is functionally the same pasta also, though it is easier to read IMNSHO.

Looking at my comments and code, written a few years back during my Vulcan mind-meld with the WEB source, it seems that a boundary character is used to prevent ligatures and kerns from occurring when two or more adjacent characters are in different fonts.

The thing is, there's only one boundary character per TFM font.  Therefore, it kind of by definition has to serve as some kind of generic flag in multiple situations.  There's no express metrics stored for a boundary character per se in the TFM, but if it's a legal character code (between 0 and 255 for TFM), then presumably that character in the font can have metrics, usually of zero width, but not precluded from having non-zero width.

Unfortunately, a character with zero width is formally considered missing from the TFM font, in order to save space by not storing some other bit somewhere in the font data that would declare a character code between |bc| and |ec| as missing (see the char_exists() macro in the WEB source; it tests for positive width).

Because of that little non-orthogonal problem, there's the TFM font's so-called "false" boundary character, which is synthesized when the TFM file is read in.  The false boundary character is the boundary character, unless the boundary character's width is non-zero, in which case the false boundary character is set to a not-a-character value.  The comment in WEB source says it's to prevent "spurious ligatures".  This smells like a hack to me, but perhaps it's elegant.  Again, the problem being solved (I think) is how to introduce a character of zero width into the layout to break a kern or ligature, rather than having it flagged during input as missing from the font before any attempt to append.

DEK uses the phrase "pseudo-ligatures" in a comment, but he never defines the term, and the phrase is not used anywhere else in the TeX code that I can find, so that's not much help.

Anyway, FWIW after a quick flyby of the code.  Because of the complicated nature of the ligature stack and the ligature/kern "program" in the TFM file, I'm probably not explaining stuff going on there very well.  Indeed, the above may be quite wrong.

It seems post-mature optimization is kind of evil too. :-)

Doug McKenna

----- Original Message -----
From: "Karl Berry" <karl at freefriends.org>
To: "Didier Verna" <didier at didierverna.net>
Cc: "texhax" <texhax at tug.org>
Sent: Wednesday, September 18, 2019 4:42:24 PM
Subject: Re: About boundary characters

Hi Didier,

    I have several questions about boundary characters in the TFM format.

I surmise experimentation is necessary. The "specifications", such as
they are, are insufficient, so far as I can tell. (Since they were added
in the 1989 update, Don had only a tiny amount of space in which to
describe them.)

It's never been clear to me what TeX actually does with boundary
characters (so maybe their metrics do not matter?). I believe that they
are only relevant in the ligkern table, but that's about all I know.
I read the descriptions in the {mf,tex}{book,.web}, as I suppose you
have also, but clarity is not forthcoming. As far as I know there is no
other significant source of information.

Doug, I surmise you may have more knowledge than anyone? But maybe your
re-implementation was too long ago now :).

It would be nice to have a thorough article for TUGboat on boundary

As for what existing fonts may or may not do with them, (1) it's hard to
say anything without knowing what fonts you are talking about, and (2) I
wouldn't take it too seriously. Maybe the font creators did lots of
experiments and created boundary chars the way they did for specific
reason, but IMHO it's equally likely that they simply followed some
examples, tried to do what they thought made sense, and whatever
happened, happened. --best, karl.

More information about the texhax mailing list