[luatex] problem with slnunicode's find

Mon Mar 1 19:23:12 CET 2010

Hi,

I have trouble getting the position of a character in a UTF-8 string 
with slnunicode.  The attached Lua script reads two UTF-8 encoded (I 
think) strings, 'äb' and 'öäb', from a file and outputs their length and 
the position of the last character 'b'.  (UTF-8 characters are scrambled 
in the output, because this is on a Windows console.  But that shouldn't 
harm, should it?)

 > >texlua slnunicode-find.lua
 > line = ├ñb
 > len(line) = 2
 > character 'b' at position 3
 >
 > line = ├Â├ñb
 > len(line) = 3
 > character 'b' at position 5

I would expect the positions of 'b' being 2 and 3, resp., as that are 
the lengths of the strings as returned by unicode.utf8.len.  However, 
unicode.utf8.find seems to have another notion of the length of a 
string.  To correct these values manually (apparently the byte 
positions) one needed to know how many of the characters preceding 'b' 
are multiple bytes long.  Actually, I thought, that is what slnunicode 
is made for.

What is the preferred way to get the position of a character in a UTF-8 
string, given a string contains only 'letters'?

Best regards,
Stephan Hennig

> >texlua -v
> This is LuaTeX, Version beta-0.40.6-2009110118 (Web2C 2009) luatex.web >= v14240

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: words.utf8
URL: <http://tug.org/pipermail/luatex/attachments/20100301/4660827b/attachment.pl>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: slnunicode-find.lua
URL: <http://tug.org/pipermail/luatex/attachments/20100301/4660827b/attachment-0001.pl>