[luatex] problem with slnunicode's find

Stephan Hennig mailing_list at arcor.de
Thu Mar 4 17:00:39 CET 2010


Am 04.03.2010 09:15, schrieb Jonathan Fine:
> Stephan Hennig wrote:
>>
>>   >  >texlua slnunicode-find.lua
>>   >  line = äb
>>   >  len(line) = 2
>>   >  character 'b' at position 3
>>   >
>>   >  line = ├Â├ñb
>>   >  len(line) = 3
>>   >  character 'b' at position 5
>>
>> I would expect the positions of 'b' being 2 and 3, resp., as that are
>> the lengths of the strings as returned by unicode.utf8.len.
>
>
> Stephan: Is this what you want (except of course in Lua)?
>
> $ python
> Python 2.6.2 (release26-maint, Apr 19 2009, 01:58:18)
> [GCC 4.3.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>   >>>  data = 'äb', 'öäb', u'äb', u'öäb'
>   >>>  data
> ('\xc3\xa4b', '\xc3\xb6\xc3\xa4b', u'\xe4b', u'\xf6\xe4b')
>   >>>  for s in data: print repr(s), s, len(s), s.index('b')
> ...
> '\xc3\xa4b' äb 3 2
> '\xc3\xb6\xc3\xa4b' öäb 5 4
> u'\xe4b' äb 2 1
> u'\xf6\xe4b' öäb 3 2
>
> In the above we have the two strings, first in 8-bit form and then in
> unicode.

If strings start with index zero in Python, then yes, the second variant 
is what I'm after.

Best regards,
Stephan Hennig


More information about the luatex mailing list