[tex-live] Problems with non-7bit characters in filename
Klaus Ethgen
Klaus+texlivelist at ethgen.ch
Sat Jul 5 15:29:51 CEST 2014
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
Hi,
Am Fr den 4. Jul 2014 um 9:42 schrieb Robin Fairbairns:
> no, it's correct: iso 8859-1 has no "forbidden" octets (it does, iirc,
> have some unassigned ones)
>
> whereas
>
> utf-8 rejects some octets in some contexts, since it's generating a
> 32-bit glyph from 8-bit input. (it's complicated. honest.)
See the output from the following line:
perl -MEncode -e 'for (my $i=0; $i < 256; $i++){printf "%d:\t%s\t%s\t0x%s\t0x%s\n", $i, chr($i), encode("UTF-8", decode("iso-8859-1", chr($i))), unpack("h*", chr($i)), unpack("h*", encode("UTF-8", decode("iso-8859-1", chr($i))));}' | less
I use less as not all characters are nice to the terminal. On a full
8bit terminal that will output the latin1 char, the char after converted
to UTF-8 and the hex value of both. On a UTF-8 terminal that will most
likely not work (or perl tell you that ther is something wrong) or
simply don't show any in the first column as it cannot display all
octeds that can be in latin1.
Gruß
Klaus
- --
Klaus Ethgen http://www.ethgen.ch/
pub 4096R/4E20AF1C 2011-05-16 Klaus Ethgen <Klaus at Ethgen.de>
Fingerprint: 85D4 CA42 952C 949B 1753 62B3 79D0 B06F 4E20 AF1C
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQGcBAEBCgAGBQJTt/3NAAoJEKZ8CrGAGfasbwoL+gPdyOmScBcEgVPOGRVt+UbD
zoXmR8ZNS9meQt/LbeKkrPbUuRgUAGb5IR9Zwkg3z3pc7uQPu5hW5SMEuHKgxDBr
KIWxF9sIsQVZ+u0xyC+mWbEhGKfRQPDpMErbErDMnWpggHu0jcFAmn2I+brZ6AV3
qO2Clq4H5oJYJk3EwnstSs4Z+UsghsxCShiEfDgLXMuQNO7WsIFXWLsIlOSpMCp0
wkVP8obiMioU34UmX0VcN57OYOZze/ocqM8OgVxhnYmSI5KFXdkzioRi+9MVQH8o
uTRiyY+f5xGMlLjEIhUTADTqG82lx8V616WV8xask308V2KZ+v414u+3peTjAnbw
qv5v+Rxz88fQNbrsoPuBFiCIRFsdQnBdpCCjcpR2kQbfXT6m401BVpz8Jt2EaDFV
s/k1pb/PovPvjJGtVtq2mpSaydYAf66yJzQqghxGqHfLcj9fQJgT20iuSPrUeoo5
SkszuPAyh6w1QDBRgIh2mdQ4sidJgW7AMaUChCdqRQ==
=55O3
-----END PGP SIGNATURE-----
More information about the tex-live
mailing list