[XeTeX] Discretionary hyphens don't work in paragraphed footnotes

Jonathan Kew jfkthame at gmail.com
Thu Oct 8 18:39:25 CEST 2015

On 8/10/15 16:29, Bruno Le Floch wrote:
> Here's a shorter example which hyphenates with cmr12 (in pdfTeX/XeTeX)
> but not with the font copied from David's example: hyphenation is lost
> when closing the hbox, as can be seen by adding the appropriate
> \tracingonline=1\showboxbreadth=99 and \showlists just before the
> closing brace.
> I have no idea why hyphenation is lost, though.  As far as I can tell,
> in David's example hyphenation is lost once the text is broken into
> line a first time when put into a \vbox, while with cmr12 hyphenation
> is kept through further unboxings.
> --
> \ifx\XeTeXversion\undefined
> \font\x=cmr12
> \else
> \font\x="[lmroman12-regular]:mapping=tex-text"
> \fi
> \x
> \setbox0\hbox{XXXXXXXXXXXXX just a few normal words to fill up the line
>    up to my x x x zzzzzz\-zzzzz}
> \unhbox0
> \bye

OK, I think I see what's happening here. When xetex finishes building an 
\hbox, it will drop any discretionaries that occur directly between two 
adjacent runs of characters that use the same OpenType font, and merge 
the preceding and following runs into a single node.

It does this so that OpenType shaping features (ligatures, kerning, or 
more advanced contextual features...) will apply correctly across the 
whole word, rather than being broken at the (presumed unused) 
discretionary break.

The trouble here is that when the \hbox is subsequently unboxed, it 
can't reintroduce the discretionary that was discarded. So when the text 
from the \hbox is then used in forming a new paragraph, it just gets 
automatic hyphenation applied.

I suppose to fix this, we'll need to keep track of discretionaries that 
were "elided" from native_word nodes, rather than just discarding them 

A possible workaround would be to define \- such that it always breaks 
xetex's native_word nodes; something like this might work:

   \def\-{\leavevmode \kern0pt \discretionary{-}{}{}}

This means that explicit discretionary hyphens will interfere with 
ligatures and kerning (etc), but OTOH they already do that in standard 

   \font\x = cmr12
   \x AV office \par      % with kerning and ligatures
   \x A\-V of\-fice \par  % no AV kern, only the "fi" ligature

In comparison, with XeTeX (and without the extra \def suggested above):

   \font\x = "[lmroman12-regular]"
   \x AV office \par      % with kerning and ligatures
   \x A\-V of\-fice \par  % typesets identically

This is generally considered a feature rather than a bug.

But the loss of explicit discretionaries when hboxing and then unhboxing 
text is clearly a problem that we should figure out how to fix.


More information about the XeTeX mailing list