[tex-live] Unicode filename problem

Torsten Ekedahl teke at math.su.se
Sat Nov 1 10:27:26 CET 2008


I've just switched from iso-latin-1 to UTF-8 on my computer. This has revealed 
a problem with tex.

In case it matters I use the Ubuntu 2007-13 version of texlive and get the 
following version information

homealone[1]latex
This is pdfTeXk, Version 3.141592-1.40.3 (Web2C 7.5.6)

Anyway the problem is with non-ASCII letters in file names. There is no 
problem if I use straight latex:

homealone[1]latex inlämning.tex
This is pdfTeXk, Version 3.141592-1.40.3 (Web2C 7.5.6)
 %&-line parsing enabled.
entering extended mode
(./inlämning.tex
LaTeX2e <2005/12/01>
Babel <v3.8h> and hyphenation patterns for english, usenglishmax, swedish, 
dumy
lang, nohyphenation, loaded.
)
*

(The actual filename may come out funny in this mail but it is 
inl aedieresis mning.tex )

However in case I try to input the file from another one it doesn't work

homealone[1]cat test.tex
\input inlämning.tex
homalone[1]latex test.tex
This is pdfTeXk, Version 3.141592-1.40.3 (Web2C 7.5.6)
 %&-line parsing enabled.
entering extended mode
(./retex.tex
LaTeX2e <2005/12/01>
Babel <v3.8h> and hyphenation patterns for english, usenglishmax, swedish, 
dumy
lang, nohyphenation, loaded.
(./inlämning.tex) (/usr/share/texmf-texlive/tex/latex/base/article.cls
Document Class: article 2005/09/16 v1.4f Standard LaTeX document class
(/usr/share/texmf-texlive/tex/latex/base/size10.clo))
(/usr/share/texmf-texlive/tex/latex/base/inputenc.sty
(/usr/share/texmf-texlive/tex/latex/base/utf8.def
(/usr/share/texmf-texlive/tex/latex/base/t1enc.dfu)
(/usr/share/texmf-texlive/tex/latex/base/ot1enc.dfu)
(/usr/share/texmf-texlive/tex/latex/base/omsenc.dfu)))
! I can't find file `inl'.
<to be read again>
                   \unhbox
l.8 \input inlä
                mning.tex

I get the same result if I first try to switch to utf8 encoding:
homealone[1]cat test.tex
\documentclass[a4paper,twoside]{article}
\usepackage[utf8]{inputenc}
\input inlämning.tex

This is probably not too surprising (and I wasn't) but it is not clear to me 
that this is the way God intended it to be. However, it becomes more 
surprising if one tries to dump a format:

homealone[1]cat retex.tex
\let\DDDD\dump
\let\dump\relax
\input latex.ltx
\documentclass[a4paper,twoside]{article}
\usepackage[utf8]{inputenc}
\DDDD
homealone[1]pdftex --ini --output-format dvi retex.tex
homealone[1]pdftex '&retex' inlämning.tex
This is pdfTeXk, Version 3.141592-1.40.3 (Web2C 7.5.6)
 %&-line parsing enabled.
! I can't find file `inl'.
<to be read again>
                   \unhbox
<*> &retex inl^^c3^^a4
                      mning.tex
Please type another input file name:
! Emergency stop.
<to be read again>
                   \unhbox
<*> &retex inl^^c3^^a4
                      mning.tex

Apart from the fact that I get different characters (ä vs ^^c3^^a4) reported I 
more or less understand that the conversion from UTF-8 to TeX's internal 
character format plays havoc with the file name. However, I wanted to bring 
it to everyone's attention but wouldn't be too upset with a "don't do that 
then" kind of answer.

		Torsten




More information about the tex-live mailing list