[tex-live] Problems with non-7bit characters in filename

Reinhard Kotucha reinhard.kotucha at web.de
Fri Jul 4 03:17:29 CEST 2014


On 2014-07-04 at 00:08:42 +0100, Klaus Ethgen wrote:

 > Am Do den  3. Jul 2014 um 23:46 schrieb Zdenek Wagner:
 > > > I was pointed to this list to report the following Bug. Please put me in
 > [Bug in filesystem code]
 > > Lualatex is right, umlaut characters in latin1 are invalid sequences
 > 
 > Thats true. While latin1 can include every possible character, UTF-8
 > cannot. (possible as possible to have on the wire)

You misunderstood.  The opposite is true.  UTF-8 (Unicode) supports
all characters, Latin1 is a simple 8-bit encoding which supports only
Western European languages (except French).

UTF-8 is the encoding of the future because it supports all languages
used today.  This is the reason why XeTeX and LuaTeX exist at all.

When I took over maintenance of VnTeX my OS still used Latin1.  It was
a pain!  I then switched to UTF-8 and everything worked fine.

I must admit that it was easy to do the change in my case because I
avoided non-ASCII characters in file-names in the past.  Nowadys (with
UTF-8) I don't hesitate to use Russian or Korean characters in file or
directory names at all.

IMO all these national ISO-2022/ISO-8859 encodings are archaic.  The
future is UTF-8.

 > > in utf-8 but both luatex and xetex work internally in unicode. I
 > > am not sure whether it is possible to change interaction with
 > > file system encoding easily.
 > 
 > Why converting the filename at all? The file name is the same on
 > command line and on the file system. So without any reencoding
 > everything would be fine.

It's not always the case.  A German Windows is using CP1252 on the
command line and UTF-16 internally for file names.  It's a pain.

 > > Anyway, many years ago whe I did not use utf-8 in Linux, such file
 > > name did not work even in OpenOffice.

Yes, AFAIK OpenOffice gratefully supports UTF-8.  You should have
configured your file system to use UTF-8.  This is the default for
years (on Linux and OS/X, at least).  Windows always lags 20..30 years
behind and still insists on CP1252 (CP850 on the command line) for
German and similar idiocies for other languages.

 > I never had that problems with latin1 (except with only few software
 > like luatex). But I had many problems in past with trying to use UTF-8.
 > However, that personal stuff is good to know but does not help in this
 > situation.

But now you have problems with Latin1.  The reason is that you still
insist on archaic encodings like Latin1 while the rest of the world is
striving towards Unicode.  I strongly recommend to switch to UTF-8
completely.  If you're on Linux or OS/X, simply stick with the defaults.

 > Fact is that even software that uses UTF-8 (or other unicode) internal,
 > work well in my environment. (Examples: Libreoffice, Gimp, Geeqie, ...
 > (Geeqie, I am one of the people working on it)) So it must be possible
 > to do that in lualatex or xetex too.

I don't know what you want to achieve.  You said:

 > While latin1 can include every possible character, UTF-8 cannot.

This is definitely wrong.  The opposite is true.

  http://www.unicode.org/standard/WhatIsUnicode.htm


Regards,
  Reinhard

-- 
------------------------------------------------------------------
Reinhard Kotucha                            Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover                    mailto:reinhard.kotucha at web.de
------------------------------------------------------------------



More information about the tex-live mailing list