[XeTeX] strange results of measuring boxes

Tue Jul 28 17:45:07 CEST 2009

Hi Marcin,

You have run into one of the situations where XeTeX "cheats" slightly  
in its effort to merge OpenType font layout with TeX algorithms!

Here's what is happening: in the font you're using, there is a kern  
defined between the characters "GT" in the OpenType data. XeTeX  
automatically recognizes and uses this. However, when there is a  
discretionary break (such as \-) between the characters, this breaks  
the text sequence that is presented to the OpenType layout system, and  
so the kern is not recognized.

When you put the text into an \hbox to measure it, discretionaries are  
discarded (because the hbox won't be subject to line-breaking), and so  
the kern takes effect. But during normal pararaphing, the  
discretionary node is present in the hlist that is built, and so the  
kern is not seen. After line-breaking, XeTeX removes all the unused  
discretionaries, and reconstructs the words across those (former)  
boundaries, so that ligatures, kerns, or other OpenType effects are  
applied properly.

So line breaks are chosen on the basis of a slightly "false" set of  
measurements, in the case where discretionaries occur at positions  
where OpenType layout effects should also occur. Normally, this is  
unimportant, as the difference in metrics is very slight and the error  
is absorbed into justification/packing of the line boxes. But in your  
case, where you are trying to fit a sequence of characters with no  
flexibility into a precisely-fitted width, it's a problem; the line  
(as measured in the paragraph's hlist) does not fit correctly into the  
(true, final) width, so the potential breakpoint has infinite badness,  
TeX's secondpass and emergencypass come into effect, and in the end  
the hyphenated solution (with an underfull box) is chosen in  
preference to the apparently-overfull line (which would actually have  
been a perfect fit after removal of the unused breakpoint). :(

This issue has always been present in XeTeX, though you seem to be the  
first user to run into a real-life problem as a result -- sorry. (Or  
at least, the first to report it!) As a workaround, I would suggest  
adding a small "fudge factor" to the measured width of the word; a  
little trial and error may be needed to determine how much is needed.

(Another "fix" would be to insert something like \hbox{} at the place  
where the discretionary break occurs, to disable the kern there  
regardless of whether the break is chosen or not. But of course  
disabling kerns is not a nice solution, as the result will be inferior  
letter spacing.)

I'd like to fix this properly sometime, but it requires some care to  
handle the interaction between the TeX and font layers of processing.

JK

On 24 Jul 2009, at 17:13, Marcin Woliński wrote:

> Dear XeTeX Gurus!
>
> In an application I'm trying to set the width of a ‘p’ column in  
> LaTeX
> tabular to the minimal width which will accommodate for a certain  
> word.
> On the TeX level that means I'm measuring the width of an \hbox
> containing the word, than use this value to set \hsize within a \vbox
> (see the attached file).  In pdfTeX this seems to work reliably.  In
> XeTeX the word sometimes gets broken.  As shown in the example, a  
> rather
> large value of \emergencystretch is necessary (however, the default
> value used by multicol is enough to trigger the problem).  The word
> being measured needs to contain a hyphen, either explicit or
> discretionary.  And it seems that XeTeX has to be using OpenType  
> fonts,
> but then the problem is not specific to TeX Gyre Heros used in the
> example.
>
> Can anyone help in understanding the problem?  Questions:
>
> 1. Do you get a similar result on the test file, that is the test word
> fits in one line in the first \vbox, but gets broken in the second?
> (I'm using svn XeTeX 821).
>
> 2. Why does (Xe)TeX find a solution during @secondpass in the first
> case, but has to resort to @emergencypass in the second?  There  
> seems to
> be no overfull in the first \vbox, although the line is reported as
> ‘tight’ in the trace (why?).  So how come the value of  
> \emergencystretch
> influences the second pass???
>
> 3. Do you see a way of overcoming this problem without setting
> \emergencystretch to 0pt?  Narrow columns in a table are the exact  
> case
> where \emergencystretch comes in handy.
>
> 4. Do you see a reliable method of measuring the minimal width I need?
>
> With best
> Marcin
>
> <testmeasurement.pdf><testmeasurement.log><testmeasurement.tex>