[luatex] problem with slnunicode's find

luigi scarso luigi.scarso at gmail.com
Tue Mar 2 14:41:07 CET 2010


On Tue, Mar 2, 2010 at 2:01 PM, Stephan Hennig <mailing_list at arcor.de> wrote:
> Am 02.03.2010 07:49, schrieb Taco Hoekwater:
>
>> Luatex itself has an internal UTF-8 counting function. At some point
>> (don't know when but before 1.0) the internal Unicode library will
>> replace slnunicode, and I will make sure that it exports a counter as
>> well.
>
> Good to know.  For the time being this paragraph from the LuaTeX manual
>
>> Note: The string library functions find etc. are not Unicode-aware.
>> In cases where this is required (i. e. because the pattern used for
>> searching contains characters above code point 127), the
>> corresponding functions from unicode.utf8 should be used.
>
> is a bit misleading, since just unicode.utf8.find is again not
> Unicode-aware.  The same applies for the empty capture () in match and
> gmatch, BTW.  The output of
>
>  str = "abcde"
>  print(unicode.utf8.match(str, "()e"))
>  str = "Äabcde"
>  print(unicode.utf8.match(str, "()e"))
>
> is 5 and 7.  The second one is obviously wrong.
I believe 7 is ok, because in utf8 Äabcde is 7 octet long
and  unittest.c says
 NOTE: find positions are in bytes for all ctypes!


-- 
luigi



More information about the luatex mailing list