[luatex] problem with slnunicode's find
Stephan Hennig
mailing_list at arcor.de
Wed Mar 3 11:49:29 CET 2010
Am 03.03.2010 02:03, schrieb luigi scarso:
> On Tue, Mar 2, 2010 at 8:15 PM, Stephan Hennig<mailing_list at arcor.de> wrote:
>> Am 02.03.2010 18:25, schrieb luigi scarso:
>>> On Tue, Mar 2, 2010 at 4:28 PM, Stephan Hennig<mailing_list at arcor.de>
>>> wrote:
>>>
>>>> While the latter two functions in general
>>>> {\it are} \UNICODE|-|aware, they fall-back to non|-|\UNICODE|-|aware
>>>> behaviour when using the empty capture \lua{()} (other captures work as
>>>> expected).
>>>
>>> Hm, I don't understand this.
>>
>> Neither do I. :) SCNR
> I mean: you said that empty capture is not unicode-aware
> but others are ok (about match an gmatch)
> Can you make a small example ?
I wanted to mail you off-list, anyway. It was just late yesterday.
Here is an example:
str = "ä#Ö"
print("str: ", str)
-- This considers 'Ö' a single upper-case letter, i.e.,
-- 'Ö' is one (character) long.
print('match("%u"): ', unicode.utf8.match(str, "(%u)"))
-- Like len does.
print('len("Ö"): ', unicode.utf8.len("Ö"))
-- This returns the byte position of 'Ö' in the string, i.e.,
-- it considers the length of 'ä' as two (bytes).
print('match("()%u"): ', unicode.utf8.match(str, "()%u"))
-- Unlike len.
print('len("ä"): ', unicode.utf8.len("ä"))
> >texlua empty.lua
> str: ä#Ö
> match("%u"): Ö
> len("Ö"): 1
> match("()%u"): 4
> len("ä"): 1
Note, the empty capture () doesn't return a match, but its position
within a string in case of a match, similar to find. So, no surprise it
returns byte positions. But one can argue, if that is documented behaviour.
Best regards,
Stephan Hennig
More information about the luatex
mailing list