[luatex] lua-inputenc

Arthur Reutenauer arthur.reutenauer at normalesup.org
Wed Feb 18 12:58:09 CET 2009


> I'm not very good at lua but I do have some doubt that chars which
> use 3 octets for coding will come out correctly. E.g. the euro-sign
> has in ansinew the position 128, so if I understand your code
> correctly it will be mapped to 
> 
> 0xC2 0x80, but its utf8 code is 0xE2 0x82 0xAC

  Yes, it simply means that you fool LuaTeX, which reads the file as if
it was UTF-8, into thinking that it saw character U+0080, so that it
prints character 0x80 to the output file, which turns out to be exactly
what you want if you use the appropriate font.  That's Unicode-heretic,
of course, but a natural trick if you're familiar with the hacks people
commonly used with 8-bit custom fonts in the pre-Unicode days, and I'm
happy to learn that it actually works.  PDF readers might even correctly
interpret the text if they have valid Type 1 fonts (thanks to the glyph
names) -- so that copypasting work, for example.

  It has nothing to do with the number of bytes a character needs in its
UTF-8 form, LuaTeX simply reads the appropriate number of them and
converts the byte stream to a list of Unicode characters upon opening
the file.

	Arthur


More information about the luatex mailing list