[luatex] lua-inputenc

Ulrike Fischer news2 at nililand.de
Wed Feb 18 18:53:55 CET 2009


Am Wed, 18 Feb 2009 12:58:09 +0100 schrieb Arthur Reutenauer:

>> I'm not very good at lua but I do have some doubt that chars which
>> use 3 octets for coding will come out correctly. E.g. the euro-sign
>> has in ansinew the position 128, so if I understand your code
>> correctly it will be mapped to 
>> 
>> 0xC2 0x80, but its utf8 code is 0xE2 0x82 0xAC
> 
>   Yes, it simply means that you fool LuaTeX, which reads the file as if
> it was UTF-8, into thinking that it saw character U+0080, so that it
> prints character 0x80 to the output file, which turns out to be exactly
> what you want if you use the appropriate font.  That's Unicode-heretic,
> of course, but a natural trick if you're familiar with the hacks people
> commonly used with 8-bit custom fonts in the pre-Unicode days, and I'm
> happy to learn that it actually works.  PDF readers might even correctly
> interpret the text if they have valid Type 1 fonts (thanks to the glyph
> names) -- so that copypasting work, for example.
> 
>   It has nothing to do with the number of bytes a character needs in its
> UTF-8 form, LuaTeX simply reads the appropriate number of them and
> converts the byte stream to a list of Unicode characters upon opening
> the file.

I don't think that I understood ;-(. 

I do understand that the code converts 8-bit input in legal utf-8. 
What I don't understand is how later on latex can print the U+0080 
(in which the euro sign got converted) as an euro. utf8enc.dfu maps 
20AC to the euro, 0080 doesn't have a declaration.

-- 
Ulrike Fischer 



More information about the luatex mailing list