[tex-live] TL expl3 update broke a mwe for me
Ingo Krabbe
ikrabbe.ask at gmail.com
Sun Jan 3 09:02:18 CET 2016
There it is: When luatex reads CaseFolding it fails to read over a UTF-8 Sequence in a comment!
There are two errors. 1st, the luatex parser should not try to read "control sequences" in comments, should it? 2nd, a file that contains ASCII representations of unicode characters should onl have comments in the 7bit ascii range.
Replace line 20 of CaseFolding.txt with
# full case foldings are superior: for example, they allow "MASSE" and "Ma(sharp s)e" to match.
for example and the luatex error is gone.
regards,
ingo
> Ah, no, I get the same error. But what is latin9 luainputenc anyway?
>
> It seems to be broken. Do you really need latin-X?
>
> Convert all your source files to utf8 (with iconv for example) and leave the bad old codepages for people who still run windows 95 or whatever.
>
> regards
>
> ingo
>
>
>>> ! Undefined control sequence.
>>> <argument> ...or: for example, they allow "MASSE" and "MaÃ
>>> e" to match.
>>> l.4195 \__unicode_map_inline:n { CaseFolding.txt }
>>
>> This looks like an encoding error. It would help if you copy and paste the strange output into od or xxd for example.
>>
>> Your non ascii sequence seems to be C3 83 C2 9F, which appears as a double UTF-8 encoding or something similar. Either the encoding of your mail, the encoding of your system or the encoding of the CaseFolding.txt file is bad, I would bet.
>>
>> With your numbers above, written in binary form you have:
>>
>> 11000011 10000011
>>
>> and
>>
>> 11000010 10011111
>>
>> that are quickly calculated into ascii / unicode numbers through the guessed utf-8 encoding
>>
>> 01. x in [000000.00000000.0bbbbbbb] → 0bbbbbbb
>> 10. x in [000000.00000bbb.bbbbbbbb] → 110bbbbb, 10bbbbbb
>> 11. x in [000000.bbbbbbbb.bbbbbbbb] → 1110bbbb, 10bbbbbb, 10bbbbbb
>> 100. x in [bbbbbb.bbbbbbbb.bbbbbbbb] → 1110bbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
>>
>> where we just need the 2nd (10) rule, here.
>>
>> decode_utf8(11000011 10000011) = 000 1100 0011
>> decode_utf8(11000010 10011111) = 000 1001 1111
>>
>> This again is a UTF-8 sequence (guessed again).
>>
>> decode_utf8(11000011 10011111) = 1101 1111 = DF
>>
>> unicode DF = ß (latin small letter sharp s)
>>
>> So "Masse and Maße" match.
>>
>> First shot: What is your system encoding. Most systems now use UTF-8 encodings. Check your locale, by just typing locale. This is an output for my system:
>>
>> # locale
>> LANG=en_US.UTF-8
>> LC_CTYPE=de_DE.UTF-8
>> LC_NUMERIC=de_DE.UTF-8
>> LC_TIME=de_DE.UTF-8
>> LC_COLLATE=de_DE.UTF-8
>> LC_MONETARY=de_DE.UTF-8
>> LC_MESSAGES="en_US.UTF-8"
>> LC_PAPER="en_US.UTF-8"
>> LC_NAME="en_US.UTF-8"
>> LC_ADDRESS="en_US.UTF-8"
>> LC_TELEPHONE="en_US.UTF-8"
>> LC_MEASUREMENT="en_US.UTF-8"
>> LC_IDENTIFICATION="en_US.UTF-8"
>> LC_ALL=
>>
>> Try your example with a utf8 system encoding.
>>
>> regards
>>
>> ingo
More information about the tex-live
mailing list