[luatex] problem with slnunicode's find
Stephan Hennig
mailing_list at arcor.de
Tue Mar 2 19:33:59 CET 2010
Am 02.03.2010 17:18, schrieb luigi scarso:
> On Tue, Mar 2, 2010 at 4:39 PM, Stephan Hennig<mailing_list at arcor.de> wrote:
>> Am 02.03.2010 14:41, schrieb luigi scarso:
>>>
>>> I believe 7 is ok, because in utf8 Äabcde is 7 octet long
>>> and unittest.c says
>>> NOTE: find positions are in bytes for all ctypes!
>>
>> Logicians might be satisfied with broken behaviour as long as it's
>> documented.
> I believe that it's not a broken behaviour, it's only a mix from two
> differents points of view:
> "abstract" (or "sign" or "glyph" o "character" ), where we see Ä as "unit"
> and "implementation" where Ä in utf8 is two octet.
Yes, that's why I call it "broken". Switching point of view within the
unicode.utf8 functions doesn't seem a good design to me. I cannot see
why it could be sensible to regard the length of Ä as one (character) in
len and two (octets) in find. After all, we already have function(s)
that return byte positions in a strings, string.find or
unicode.ascii.find. Why not drop unicode.utf8.find at all? That'd be a
clear design. (Only beaten by a find function that regards Ä the same
length as len does. There are use-cases for such a find function.)
>> But I'm not a logician, so I cannot agree. :)
> To be honest I'm not confortable with regex and unicode.
>
> Perl can help here, but, just to see an example
>
> #> perl -e '$str = "Äabcde"; print length($str),"\n" ;' ;
> 7
> #> perl -e 'use utf8; $str = "Äabcde"; print length($str),"\n" ;' ;
> 6
Same with string.len and unicode.ut8.len in Lua. You made me curious.
Is there a find function in Perl? What values does that return?
Best regards,
Stephan Hennig
More information about the luatex
mailing list