[XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms

Andrew Goldstone andrew.goldstone at gmail.com
Wed Sep 6 20:38:48 CEST 2023


Thank you for the hint about ucharclasses! That saves my writing the
\XeTeXinterchartoks lines myself and does (rather mysteriously?) seem to
avoid the segfault in conjunction with \XeTeXinterwordspaceshaping=2. The
\XeTeXlinebreaklocale "my" still looks wrong--it breaks a ligature (i.e. a
conjunct consonant) apart at a line break--but this is much closer to what
my colleague needs. Thanks again. Hoping someone may be able to add more
about the Burmese-specific aspect of all this. All best,
Andrew

On Wed, Sep 6, 2023 at 12:33 PM Shree Devi Kumar <shreeshrii at gmail.com>
wrote:

> You can try https://github.com/Pomax/ucharclasses
>
> I have used it in past with Devanagari, Tamil, Gujarati scripts and
> English.
>
> On Wed, Sep 6, 2023, 11:23 AM Andrew Goldstone <andrew.goldstone at gmail.com>
> wrote:
>
>> Hello: I am attempting to assist a colleague, who is new to TeX, in
>> typesetting a text which includes many passages in which Burmese and Latin
>> scripts are closely intermixed. I wanted to make it possible for my
>> colleague to enter his text fairly naturally, as he is used to doing in
>> Word, by simply mixing the scripts, rather having to type a macro to switch
>> languages/fonts at nearly every word. On tex.stackexchange I found a
>> suggestion to use XeTeX's interchar mechanism for this purpose and adapted
>> the code example to my own purposes.
>>
>> Though this works fine on its own, it leads to problems, and sometimes
>> crashes, in conjunction with two other desirable XeTeX features, namely its
>> linebreak-locale and interword space-shaping mechanisms. The example below
>> my signature demonstrates the following three-way interaction:
>>
>> (A) XeTeXlinebreaklocale="my"
>> (B) XeTeXinterwordspaceshaping=2
>> (C) XeTeXinterchartokenstate=1 (and accompanying char. class definitions)
>>
>> A       some ligatures render incorrectly, e.g. lla လ္ +လ
>> B       ok, but must use explicit \selectlanguage{burmese}
>> C       ok, but Burmese lines only broken on spaces (unidiomatic)
>> A+B     ok, but must use explicit \selectlanguage{burmese}
>> A+C     ligature renders incorrectly
>> B+C     segfault if more than one switch to Burmese
>> A+B+C   segfault if more than one switch to Burmese
>>
>> My system is macOS 13.5 on Apple M1 Pro, XeTeX 3.141592653-2.6-0.999995
>> (TeX Live 2023).
>>
>> I can certainly help my colleague work around the crashing bug by
>> postprocessing his source with a script to insert \selectlanguage{} next to
>> the appropriate Unicode range, but the crash is frustrating. I believe this
>> is the same issue as was raised on StackExchange in 2019
>>
>>
>> https://tex.stackexchange.com/questions/503498/trouble-with-stacked-consonants-burmese-script
>>
>> but I couldn't find any further discussion of a fix for the crash.
>>
>> Many thanks for any help: perhaps I've come at this all wrong. My own
>> XeTeX experience has almost all been in the Latin alphabet. Best,
>> Andrew Goldstone
>>
>> PS my example script--forgive the verbosity. The two Burmese words are
>> just taken at random from my colleague's sample text, with the first
>> repeated to fill out a line.
>>
>> \documentclass[draft,12pt]{article}
>> \usepackage[english]{babel}
>> \babelprovide[import]{burmese}
>> \babelfont[burmese]{rm}{Noto Serif Myanmar Regular}
>>
>> \XeTeXlinebreaklocale "my"     % (A)
>> \XeTeXinterwordspaceshaping=2  % (B)
>>
>> % (C)...
>>
>> \newXeTeXintercharclass\burmesesub
>> \newcount\myCount
>> \myCount="1000
>> \loop\ifnum\myCount<"109F
>>   \XeTeXcharclass\myCount=\burmesesub
>>   \advance\myCount by 1
>> \repeat
>>
>> \XeTeXinterchartoks 0 \burmesesub = {\begingroup\selectlanguage{burmese}}
>> \XeTeXinterchartoks 4095 \burmesesub =
>> {\begingroup\selectlanguage{burmese}}
>> \XeTeXinterchartoks \burmesesub 0 = {\endgroup}
>> \XeTeXinterchartoks \burmesesub 4095 = {\endgroup}
>>
>> \XeTeXinterchartokenstate=1
>>
>> % ...(C)
>>
>> \begin{document}
>>
>>
>> ထက်လုလ္လ
>> thak·lulla
>> ထက်လုလ္လ
>> thak·lulla
>> ထက်လုလ္လ
>> thak·lulla
>> ထက်လုလ္လ
>> thak·lulla
>>
>> သည် ၊ saññ·|
>>
>> \end{document}
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/xetex/attachments/20230906/3db66b07/attachment-0001.htm>


More information about the XeTeX mailing list.