# [XeTeX] Discretionary hyphens don't work in paragraphed footnotes

Jonathan Kew jfkthame at gmail.com
Thu Oct 8 18:39:25 CEST 2015

On 8/10/15 16:29, Bruno Le Floch wrote:
> Here's a shorter example which hyphenates with cmr12 (in pdfTeX/XeTeX)
> but not with the font copied from David's example: hyphenation is lost
> when closing the hbox, as can be seen by adding the appropriate
> \tracingonline=1\showboxbreadth=99 and \showlists just before the
> closing brace.
>
> I have no idea why hyphenation is lost, though.  As far as I can tell,
> in David's example hyphenation is lost once the text is broken into
> line a first time when put into a \vbox, while with cmr12 hyphenation
> is kept through further unboxings.
>
> --
>
>
> \ifx\XeTeXversion\undefined
> \font\x=cmr12
> \else
> \font\x="[lmroman12-regular]:mapping=tex-text"
> \fi
>
> \x
>
> \setbox0\hbox{XXXXXXXXXXXXX just a few normal words to fill up the line
>    up to my x x x zzzzzz\-zzzzz}
>
> \unhbox0
>
> \bye

OK, I think I see what's happening here. When xetex finishes building an
\hbox, it will drop any discretionaries that occur directly between two
adjacent runs of characters that use the same OpenType font, and merge
the preceding and following runs into a single node.

It does this so that OpenType shaping features (ligatures, kerning, or
more advanced contextual features...) will apply correctly across the
whole word, rather than being broken at the (presumed unused)
discretionary break.

The trouble here is that when the \hbox is subsequently unboxed, it
can't reintroduce the discretionary that was discarded. So when the text
from the \hbox is then used in forming a new paragraph, it just gets
automatic hyphenation applied.

I suppose to fix this, we'll need to keep track of discretionaries that
were "elided" from native_word nodes, rather than just discarding them
completely.

A possible workaround would be to define \- such that it always breaks
xetex's native_word nodes; something like this might work:

\def\-{\leavevmode \kern0pt \discretionary{-}{}{}}

This means that explicit discretionary hyphens will interfere with
ligatures and kerning (etc), but OTOH they already do that in standard
TeX, AFAICT:

\font\x = cmr12
\x AV office \par      % with kerning and ligatures
\x A\-V of\-fice \par  % no AV kern, only the "fi" ligature
\end

In comparison, with XeTeX (and without the extra \def suggested above):

\font\x = "[lmroman12-regular]"
\x AV office \par      % with kerning and ligatures
\x A\-V of\-fice \par  % typesets identically
\end

This is generally considered a feature rather than a bug.

But the loss of explicit discretionaries when hboxing and then unhboxing
text is clearly a problem that we should figure out how to fix.

JK