[luatex] problem with slnunicode's find
luigi scarso
luigi.scarso at gmail.com
Tue Mar 2 14:41:07 CET 2010
On Tue, Mar 2, 2010 at 2:01 PM, Stephan Hennig <mailing_list at arcor.de> wrote:
> Am 02.03.2010 07:49, schrieb Taco Hoekwater:
>
>> Luatex itself has an internal UTF-8 counting function. At some point
>> (don't know when but before 1.0) the internal Unicode library will
>> replace slnunicode, and I will make sure that it exports a counter as
>> well.
>
> Good to know. For the time being this paragraph from the LuaTeX manual
>
>> Note: The string library functions find etc. are not Unicode-aware.
>> In cases where this is required (i. e. because the pattern used for
>> searching contains characters above code point 127), the
>> corresponding functions from unicode.utf8 should be used.
>
> is a bit misleading, since just unicode.utf8.find is again not
> Unicode-aware. The same applies for the empty capture () in match and
> gmatch, BTW. The output of
>
> str = "abcde"
> print(unicode.utf8.match(str, "()e"))
> str = "Äabcde"
> print(unicode.utf8.match(str, "()e"))
>
> is 5 and 7. The second one is obviously wrong.
I believe 7 is ok, because in utf8 Äabcde is 7 octet long
and unittest.c says
NOTE: find positions are in bytes for all ctypes!
--
luigi
More information about the luatex
mailing list