[luatex] lua-inputenc

Manuel Pégourié-Gonnard mpg at elzevir.fr
Wed Feb 18 13:34:01 CET 2009


Arthur Reutenauer a écrit :
>> 0xC2 0x80, but its utf8 code is 0xE2 0x82 0xAC
> 
>   Yes, it simply means that you fool LuaTeX, which reads the file as if
> it was UTF-8, into thinking that it saw character U+0080, so that it
> prints character 0x80 to the output file, which turns out to be exactly
> what you want if you use the appropriate font.  That's Unicode-heretic,
> of course, but a natural trick if you're familiar with the hacks people
> commonly used with 8-bit custom fonts in the pre-Unicode days, and I'm
> happy to learn that it actually works.  PDF readers might even correctly
> interpret the text if they have valid Type 1 fonts (thanks to the glyph
> names) -- so that copypasting work, for example.
> 
>   It has nothing to do with the number of bytes a character needs in its
> UTF-8 form, LuaTeX simply reads the appropriate number of them and
> converts the byte stream to a list of Unicode characters upon opening
> the file.
> 
I agree, in some sense it is the easiest way to make inputenc "work"
with LuaTeX exactly the same way as it works with an 8-bit TeX.

But it means you also keep all the problems of the current inputenc over
8-bit TeX. I really wonder whether it's useful to be keep that level of
compatibility: it would just mean no benefit on the encoding side for
the unaware user switching to the luatex engine without switching to
"real" utf-8 without inputenc...

Manuel.



More information about the luatex mailing list