[luatex] problem with slnunicode's find

Stephan Hennig mailing_list at arcor.de
Wed Mar 3 11:49:29 CET 2010


Am 03.03.2010 02:03, schrieb luigi scarso:
> On Tue, Mar 2, 2010 at 8:15 PM, Stephan Hennig<mailing_list at arcor.de>  wrote:
>> Am 02.03.2010 18:25, schrieb luigi scarso:
>>> On Tue, Mar 2, 2010 at 4:28 PM, Stephan Hennig<mailing_list at arcor.de>
>>>   wrote:
>>>
>>>>   While the latter two functions in general
>>>> {\it are} \UNICODE|-|aware, they fall-back to non|-|\UNICODE|-|aware
>>>> behaviour when using the empty capture \lua{()} (other captures work as
>>>> expected).
>>>
>>> Hm, I don't understand this.
>>
>> Neither do I. :)  SCNR
> I mean: you said that empty capture is not unicode-aware
> but others are ok (about match an gmatch)
> Can you make a small example  ?

I wanted to mail you off-list, anyway.  It was just late yesterday. 
Here is an example:

str = "ä#Ö"
print("str: ", str)

-- This considers 'Ö' a single upper-case letter, i.e.,
-- 'Ö' is one (character) long.
print('match("%u"): ', unicode.utf8.match(str, "(%u)"))
-- Like len does.
print('len("Ö"): ', unicode.utf8.len("Ö"))

-- This returns the byte position of 'Ö' in the string, i.e.,
-- it considers the length of 'ä' as two (bytes).
print('match("()%u"): ', unicode.utf8.match(str, "()%u"))
-- Unlike len.
print('len("ä"): ', unicode.utf8.len("ä"))

> >texlua empty.lua
> str:    ä#Ö
> match("%u"):    Ö
> len("Ö"):      1
> match("()%u"):  4
> len("ä"):      1

Note, the empty capture () doesn't return a match, but its position 
within a string in case of a match, similar to find.  So, no surprise it 
returns byte positions.  But one can argue, if that is documented behaviour.

Best regards,
Stephan Hennig


More information about the luatex mailing list