[XeTeX] \XeTeXdashbreakstate=1

Karl Berry karl at freefriends.org
Wed Apr 11 00:44:09 CEST 2012


Barring major objections arising, I plan to have TL set
\XeTeXdashbreakstate=1 in xe(la)tex.ini this year.  For those who don't
know about this obscure parameter -- it allows line breaks after
em/en-dashes.

Pro: this has always been the behavior of traditional TeX.  It is also
the behavior of LuaTeX.  So it is more compatible for XeTeX to operate
in the same way.

Con: it is not the way XeTeX has operated to date.  So existing XeTeX
documents may see their line breaks change out from under them.  Of
course, they can set \XeTeXdashbreakstate=0 to restore previous behavior.

Jonathan did not know of any specific reason why he had not always set
it, from the start.

Below is mail from Jonathan and Khaled with technical details.

Best,
Karl


Date: Sun, 08 Apr 2012 12:27:05 +0100
From: Jonathan Kew <jfkthame at googlemail.com>
To: Karl Berry <karl at freefriends.org>
CC: khaledhosny at eglug.org
Subject: Re: losing breakpoint after em-dash

On 8/4/12 01:39, Karl Berry wrote:
> [...] I discovered a strange
> line-breaking discrepancy between XeTeX and LuaTeX: XeTeX disallowed a
> line break after an em-dash, while LuaTeX allowed it.  Traditional
> behavior has always had a breakpoint there (a \discretionary).
>
> Khaled kindly looked into it for me, and observed that XeTeX does not
> reconstitute the hyphenation point after the dash.  See his email below,
> plus input and log files.
>
> [...] figured you could shed
> light on whether this behavior was intentional for some reason, or if
> it's "just" a bug.  On the face of it, the difference in behavior (and
> break with the past, no pun intended :) seems undesirable.
>
> Thanks,
> Karl
>

If you want automatic insertion of a discretionary break after en- and 
em-dash (like explicit hyphen), set \XeTeXdashbreakstate=1.

(The traditional behavior arises because of the use of a ligature of 
hyphens to create the dashes, so the "post-dash" break is really a 
post-hyphen break. When using the literal Unicode dash characters, this 
no longer happens implicitly as a side-effect of the representation of 
the dash, so \XeTeXdashbreakstate lets you extend the hyphen behavior 
explicitly to the dashes.)

JK


Date: Sat, 31 Mar 2012 10:17:52 +0200
From: Khaled Hosny <khaledhosny at eglug.org>
To: Karl Berry <karl at freefriends.org>
Subject: Re: --- allowed line break per engine

[...]
the difference seems that, for OpenType processing XeTeX converts
each word into a special whatsit node (called native word) that is then
processed by the layout engine and passed back to TeX, and here the
em-dash is considered part of the word, so "variants---regular" is a
single native word node, and it seems XeTeX does not make the dash a
hyphenation point (XeTeX takes care of inserting hyphenation points
inside native word nodes).

None of this happens with LuaTeX as the OpenType processing is all done
directly on TeX nodes by lua code.

Regards,
 Khaled

-----------------------------------------------------------------------------
\input ifluatex.sty
\input ifxetex.sty
\ifluatex
  \input luaotfload.sty
  \font\lmr="Latin Modern Roman:+tlig" at 10pt
\else
  \font\lmr="Latin Modern Roman:mapping=tex-text" at 10pt
\fi
\output{\shipout\box255}
\hsize = 12cm
\tracingall

The basic text family is LucidaBrightOT, with the usual four
variants---regular, italic, bold, and bold italic; small
\end
-----------------------------------------------------------------------------
This is XeTeX, Version 3.1415926-2.3-0.9997.6 (TeX Live 2012/dev) (format=xetex 2012.3.20)  31 MAR 2012 10:10
entering extended mode
 restricted \write18 enabled.
 %&-line parsing enabled.
**hh
(./hh.tex
(/media/sda8/tex/texlive/2011/texmf-dist/tex/generic/oberdiek/ifluatex.sty
Package: ifluatex 2010/03/01 v1.3 Provides the ifluatex switch (HO)
Package ifluatex Info: LuaTeX not detected.
)
(/media/sda8/tex/texlive/2011/texmf-dist/tex/generic/ifxetex/ifxetex.sty)
{vertical mode: \tracingstats}
{\tracingpages}
{\tracingoutput}
{\tracinglostchars}
{\tracingmacros}
{\tracingparagraphs}
{\tracingrestores}
{\showboxbreadth}
{\showboxdepth}
{\errorstopmode}

{\tracinggroups}
{\tracingifs}
{\tracingscantokens}
{\tracingnesting}
{\tracingassigns}
{into \tracingassigns=2}
{\par}
{\hsize}
{changing \hsize=469.75499pt}
{into \hsize=341.43306pt}
{select font "Latin Modern Roman 10 Regular:mapping=tex-text"}
{changing current font=\tenrm}
{into current font=\lmr}
{the letter T}
{horizontal mode: the letter T}
{blank space  }
{the letter b}
{blank space  }
{the letter t}
{blank space  }
{the letter f}
{blank space  }
{the letter i}
{blank space  }
{the letter L}
{blank space  }
{the letter w}
{blank space  }
{the letter t}
{blank space  }
{the letter u}
{blank space  }
{the letter f}
{blank space  }
{the letter v}
{blank space  }
{the letter i}
{blank space  }
{the letter b}
{blank space  }
{the letter a}
{blank space  }
{the letter b}
{blank space  }
{the letter i}
{blank space  }
{the letter s}
{blank space  }
{\end}
{\par}
@firstpass
@secondpass
[]\lmr The ba-sic text fam-ily is Lu-cidaBrightOT, with the usual four vari-ant
sœôòôregular, 
@ via @@0 b=* p=0 d=*
@@1: line 1.3 t=0 -> @@0
italic, bold, and bold italic; small 
@\par via @@1 b=0 p=-10000 d=*
@@2: line 2.2- t=0 -> @@1


Overfull \hbox (18.12695pt too wide) in paragraph at lines 13--15
[]\lmr The basic text family is LucidaBrightOT, with the usual four variantsœôòô
regular,|

\hbox(7.05+2.05998)x341.43306, glue set - 1.0
.\hbox(0.0+0.0)x20.0
.\lmr The
.\glue 3.33 plus 1.665 minus 1.11
.\lmr basic
.\glue 3.33 plus 1.665 minus 1.11
.\lmr text
.\glue 3.33 plus 1.665 minus 1.11
.\lmr family
.\glue 3.33 plus 1.665 minus 1.11
.\lmr is
.\glue 3.33 plus 1.665 minus 1.11
.\lmr LucidaBrightOT,
.\glue 3.33 plus 1.665 minus 1.11
.\lmr with
.\glue 3.33 plus 1.665 minus 1.11
.\lmr the
.\glue 3.33 plus 1.665 minus 1.11
.\lmr usual
.\glue 3.33 plus 1.665 minus 1.11
.\lmr four
.\glue 3.33 plus 1.665 minus 1.11
.\lmr variantsœôòôregular,
.\glue(\rightskip) 0.0
.\rule(*+*)x5.0

%% goal height=643.20255, max depth=4.0
% t=10.0 g=643.20255 b=10000 p=300 c=100000#
{vertical mode: \end}
% t=23.92998 g=643.20255 b=10000 p=0 c=100000#
% t=23.92998 plus 1.0fill g=643.20255 b=0 p=-1073741824 c=-1073741824#
{globally changing \outputpenalty=0}
{into \outputpenalty=-1073741824}
\everypar->{\shipout \box 255}
{entering output group (level 1) at line 15}
{internal vertical mode: \shipout}

Completed box being shipped out [1]
\vbox(643.20255+0.0)x341.43306, glue set 619.27257fill
.\glue(\topskip) 2.95
.\hbox(7.05+2.05998)x341.43306, glue set - 1.0
.\hbox(0.0+0.0)x20.0
.\lmr The
.\glue 3.33 plus 1.665 minus 1.11
.\lmr basic
.\glue 3.33 plus 1.665 minus 1.11
.\lmr text
.\glue 3.33 plus 1.665 minus 1.11
.\lmr family
.\glue 3.33 plus 1.665 minus 1.11
.\lmr is
.\glue 3.33 plus 1.665 minus 1.11
.\lmr LucidaBrightOT,
.\glue 3.33 plus 1.665 minus 1.11
.\lmr with
.\glue 3.33 plus 1.665 minus 1.11
.\lmr the
.\glue 3.33 plus 1.665 minus 1.11
.\lmr usual
.\glue 3.33 plus 1.665 minus 1.11
.\lmr four
.\glue 3.33 plus 1.665 minus 1.11
.\lmr variantsœôòôregular,
.\glue(\rightskip) 0.0
.\rule(*+*)x5.0
.\penalty 300
.\glue(\baselineskip) 3.00002
.\hbox(6.94+1.92998)x341.43306, glue set 195.79306fil
.\lmr italic,
.\glue 3.33 plus 2.08124 minus 0.888
.\lmr bold,
.\glue 3.33 plus 2.08124 minus 0.888
.\lmr and
.\glue 3.33 plus 1.665 minus 1.11
.\lmr bold
.\glue 3.33 plus 1.665 minus 1.11
.\lmr italic;
.\glue 3.33 plus 2.49748 minus 0.73999
.\lmr small
.\penalty 10000
.\glue(\parfillskip) 0.0 plus 1.0fil
.\glue(\rightskip) 0.0
.\hbox(0.0+0.0)x341.43306
.\glue 0.0 plus 1.0fill

Memory usage before: 594&10194; after: 327&10194; still untouched: 2988772
{end-group character }}
{leaving output group (level 1) entered at line 15}
{vertical mode: \end}
 ) 
Here is how much of TeX's memory you used:
 33 strings out of 495895
 729 string characters out of 3167218
 11228 words of memory out of 3000000
 1503 multiletter control sequences out of 15000+200000
 4754 words of font info for 17 fonts, out of 3000000 for 9000
 1130 hyphenation exceptions out of 8191
 6i,0n,4p,108b,12s stack positions out of 5000i,500n,10000p,200000b,50000s

Output written on hh.pdf (1 page).


More information about the XeTeX mailing list