[XeTeX] Unexpected behaviour

Jonathan Kew jonathan_kew at sil.org
Tue May 16 12:33:06 CEST 2006

On 16 May 2006, at 10:59 am, Wlodek Bzyl wrote:

>    Hi
>    1. To see hyphens that xetex will find
>    in `computer keyboard' I compiled
> -------------------------------
> \showhyphens{computer keyboard}
> \end
> -------------------------------
>    The results appeared on the terminal:
> Underfull \hbox (badness 10000) in paragraph at lines 10--10
> [] \tenrm com-puter key-board
>    But compiling
> ----------------------------------------------
> \font\tenrm="Latin Modern Roman-Regular 10"
> \showhyphens{computer keyboard}
> \end
> -----------------------------------------------
>    produces different results:
> Underfull \hbox (badness 10000) in paragraph at lines 10--10
> [] \tenrm computer keyboard

This is because hyphenation using "native" fonts is handled in a  
different way. XeTeX is not working with the same lists of character  
nodes when it builds a paragraph using OpenType fonts -- it can't,  
because characters cannot be measured in isolation. A side-effect of  
this is that the \showhyphens macro does not give the same results;  
however, if you try typesetting the words "computer keyboard"  
repeatedly in a narrow column width, you'll find that the proper  
hyphenation positions are in fact found and used. They just don't  
show up in the terminal output when TeX is displaying underfull boxes.

I suppose it could be helpful to have a way of displaying XeTeX's  
hyphenation positions in native-font text, but offhand the only  
approach I can think of is to ask it to typeset the text with a very  
narrow \hsize. Try:

	\def\Xshowhyphens#1{\setbox0=\vbox{\hsize1sp\noindent\hskip0pt #1}}
	\Xshowhyphens{computer keyboard}

>    2. If we replace in plain.tex the line
>    \font\tenrm=cmr10 % roman text
>       for
>    \font\tenrm="Latin Modern Roman-Regular 10"
>       then the format file can't be \dumped
>       and the following message appears:
>     ! Can't \dump a format with preloaded native fonts.
>     Any reasons for that?


There would be little benefit to \dumping a format with preloaded  
native fonts, unless the .fmt file somehow embedded virtually the  
entire OpenType font. Otherwise, XeTeX is still going to have to  
locate the font at runtime, and is therefore still going to be  
dependent on the presence of the actual font on the machine, so there  
won't be any significant performance improvement (unlike with TFMs,  
where preloading the font metric data means that TeX can typeset  
without accessing a font file at all).

If we allowed dumping of a format with native fonts defined, that  
format file would either have to embed the complete fonts (which I  
have not seriously considered -- and it would raise licensing  
issues), or it would be "fragile" in that it would appear to define  
fonts that might not actually work at runtime (if the host font  
environment changed since it was dumped).

Better (and safer), I believe, to simply prohibit this. And on  
today's machines, the main reason for predefining fonts in the .fmt  
file (to avoid the need to locate and read the tfm files) no longer  
matters much anyway; computers and disk i/o are so much faster than  
they were when Knuth created this mechanism.

(For large macro collections, such as LaTeX, preloading and dumping  
the macro definitions does still provide a major performance boost,  
of course. And XeTeX still supports that.)


More information about the XeTeX mailing list