# [tex-live] Unicode filename problem

Torsten Ekedahl teke at math.su.se
Sat Nov 1 10:27:26 CET 2008

```I've just switched from iso-latin-1 to UTF-8 on my computer. This has revealed
a problem with tex.

In case it matters I use the Ubuntu 2007-13 version of texlive and get the
following version information

homealone[1]latex
This is pdfTeXk, Version 3.141592-1.40.3 (Web2C 7.5.6)

Anyway the problem is with non-ASCII letters in file names. There is no
problem if I use straight latex:

homealone[1]latex inlämning.tex
This is pdfTeXk, Version 3.141592-1.40.3 (Web2C 7.5.6)
%&-line parsing enabled.
entering extended mode
(./inlämning.tex
LaTeX2e <2005/12/01>
Babel <v3.8h> and hyphenation patterns for english, usenglishmax, swedish,
dumy
)
*

(The actual filename may come out funny in this mail but it is
inl aedieresis mning.tex )

However in case I try to input the file from another one it doesn't work

homealone[1]cat test.tex
\input inlämning.tex
homalone[1]latex test.tex
This is pdfTeXk, Version 3.141592-1.40.3 (Web2C 7.5.6)
%&-line parsing enabled.
entering extended mode
(./retex.tex
LaTeX2e <2005/12/01>
Babel <v3.8h> and hyphenation patterns for english, usenglishmax, swedish,
dumy
(./inlämning.tex) (/usr/share/texmf-texlive/tex/latex/base/article.cls
Document Class: article 2005/09/16 v1.4f Standard LaTeX document class
(/usr/share/texmf-texlive/tex/latex/base/size10.clo))
(/usr/share/texmf-texlive/tex/latex/base/inputenc.sty
(/usr/share/texmf-texlive/tex/latex/base/utf8.def
(/usr/share/texmf-texlive/tex/latex/base/t1enc.dfu)
(/usr/share/texmf-texlive/tex/latex/base/ot1enc.dfu)
(/usr/share/texmf-texlive/tex/latex/base/omsenc.dfu)))
! I can't find file `inl'.
\unhbox
l.8 \input inlä
mning.tex

I get the same result if I first try to switch to utf8 encoding:
homealone[1]cat test.tex
\documentclass[a4paper,twoside]{article}
\usepackage[utf8]{inputenc}
\input inlämning.tex

This is probably not too surprising (and I wasn't) but it is not clear to me
that this is the way God intended it to be. However, it becomes more
surprising if one tries to dump a format:

homealone[1]cat retex.tex
\let\DDDD\dump
\let\dump\relax
\input latex.ltx
\documentclass[a4paper,twoside]{article}
\usepackage[utf8]{inputenc}
\DDDD
homealone[1]pdftex --ini --output-format dvi retex.tex
homealone[1]pdftex '&retex' inlämning.tex
This is pdfTeXk, Version 3.141592-1.40.3 (Web2C 7.5.6)
%&-line parsing enabled.
! I can't find file `inl'.
\unhbox
<*> &retex inl^^c3^^a4
mning.tex
Please type another input file name:
! Emergency stop.
\unhbox
<*> &retex inl^^c3^^a4
mning.tex

Apart from the fact that I get different characters (ä vs ^^c3^^a4) reported I
more or less understand that the conversion from UTF-8 to TeX's internal
character format plays havoc with the file name. However, I wanted to bring
it to everyone's attention but wouldn't be too upset with a "don't do that