[XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms

Andrew Goldstone andrew.goldstone at gmail.com
Wed Sep 6 16:40:16 CEST 2023


Hello: I am attempting to assist a colleague, who is new to TeX, in
typesetting a text which includes many passages in which Burmese and Latin
scripts are closely intermixed. I wanted to make it possible for my
colleague to enter his text fairly naturally, as he is used to doing in
Word, by simply mixing the scripts, rather having to type a macro to switch
languages/fonts at nearly every word. On tex.stackexchange I found a
suggestion to use XeTeX's interchar mechanism for this purpose and adapted
the code example to my own purposes.

Though this works fine on its own, it leads to problems, and sometimes
crashes, in conjunction with two other desirable XeTeX features, namely its
linebreak-locale and interword space-shaping mechanisms. The example below
my signature demonstrates the following three-way interaction:

(A) XeTeXlinebreaklocale="my"
(B) XeTeXinterwordspaceshaping=2
(C) XeTeXinterchartokenstate=1 (and accompanying char. class definitions)

A       some ligatures render incorrectly, e.g. lla လ္ +လ
B       ok, but must use explicit \selectlanguage{burmese}
C       ok, but Burmese lines only broken on spaces (unidiomatic)
A+B     ok, but must use explicit \selectlanguage{burmese}
A+C     ligature renders incorrectly
B+C     segfault if more than one switch to Burmese
A+B+C   segfault if more than one switch to Burmese

My system is macOS 13.5 on Apple M1 Pro, XeTeX 3.141592653-2.6-0.999995
(TeX Live 2023).

I can certainly help my colleague work around the crashing bug by
postprocessing his source with a script to insert \selectlanguage{} next to
the appropriate Unicode range, but the crash is frustrating. I believe this
is the same issue as was raised on StackExchange in 2019

https://tex.stackexchange.com/questions/503498/trouble-with-stacked-consonants-burmese-script

but I couldn't find any further discussion of a fix for the crash.

Many thanks for any help: perhaps I've come at this all wrong. My own XeTeX
experience has almost all been in the Latin alphabet. Best,
Andrew Goldstone

PS my example script--forgive the verbosity. The two Burmese words are just
taken at random from my colleague's sample text, with the first repeated to
fill out a line.

\documentclass[draft,12pt]{article}
\usepackage[english]{babel}
\babelprovide[import]{burmese}
\babelfont[burmese]{rm}{Noto Serif Myanmar Regular}

\XeTeXlinebreaklocale "my"     % (A)
\XeTeXinterwordspaceshaping=2  % (B)

% (C)...

\newXeTeXintercharclass\burmesesub
\newcount\myCount
\myCount="1000
\loop\ifnum\myCount<"109F
  \XeTeXcharclass\myCount=\burmesesub
  \advance\myCount by 1
\repeat

\XeTeXinterchartoks 0 \burmesesub = {\begingroup\selectlanguage{burmese}}
\XeTeXinterchartoks 4095 \burmesesub = {\begingroup\selectlanguage{burmese}}
\XeTeXinterchartoks \burmesesub 0 = {\endgroup}
\XeTeXinterchartoks \burmesesub 4095 = {\endgroup}

\XeTeXinterchartokenstate=1

% ...(C)

\begin{document}


ထက်လုလ္လ
thak·lulla
ထက်လုလ္လ
thak·lulla
ထက်လုလ္လ
thak·lulla
ထက်လုလ္လ
thak·lulla

သည် ၊ saññ·|

\end{document}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/xetex/attachments/20230906/ff2d9e34/attachment.htm>


More information about the XeTeX mailing list.