[tex-live] Duplicate Thai patterns reported by XeTeX in TL 2015 pretest

Joseph Wright joseph.wright at morningstar2.co.uk
Tue Apr 14 18:53:24 CEST 2015

On 14/04/2015 12:51, Ulrike Fischer wrote:
> Am Tue, 14 Apr 2015 10:27:30 +0200 schrieb Mojca Miklavec:
>
>>> ! Duplicate pattern.
>>> l.44 ก4ข
>
>> It's weird because I don't see the pattern "ก4ข" appear anywhere.
>> There are these patterns:
>> 35 ก4ขค
>> 37 ก4ขช
>> 39 ก4ขณ
>> 42 ก4ขบ
>> 44 ก4ขภ
>> 46 ก4ขม
>> 52 ก4ขเ
>> 55 ก4ข์
>> but nowhere I see the isolated string reported as duplicate.
>
> I look a bit at the difference between unicode-letters.tex and .def
> and there are some more entries in .def. The most probable suspect
> is imho
>
>  \L 0E4C 0E4C 0E4C
>
> which didn't exist in .tex and which isn't a letter but a mark:
> http://www.fileformat.info/info/unicode/char/0e4c/index.htm
>
> This unicode char is used above in the 55 line.

If you read all of the set-up in unicode-letters.tex you'll find

\def\l #1 {\L #1 #1 #1 } % letter without case mappings
\let\m=\l % combining mark - treated as uncased letter

then later

\m 0E4C

i.e. 0E4C ends up as

\L 0E4C 0E4C 0E4C

but at the point of reading the file rather than writing it. In
unicode-letters.def I've tried to optimise for reading so do the work in
the processing step (as only I or another member of the team have to sit
through that).

It would not surprise me if there are errors in the set up at the
moment, but this one looks OK to me. For a check

\showthe\catcode"0E4C %

This is XeTeX, Version 3.14159265-2.6-0.99991 (TeX Live 2014/W32TeX)