Non-ASCII characters in filenames/Unicode
John Collins
jcc8 at psu.edu
Thu Mar 10 00:24:47 CET 2022
On 3/8/22 12:30 AM, Akira Kakuto wrote:
>
> Mixed encoding is a bug.
> In the new Windows binaries, PWD is also encoded in UTF-8 in
> luatex, pdftex, xetex, uptex, and euptex.
Great. I'll leave my work-around in latexmk, since not everyone updates
TeXLive to the latest version. (The work around depends on testing whether or
not the PWD line is valid UTF-8, so it will behave properly with the new binaries.)
There's one other UTF-8 anomaly I noticed (and have work-around code for it in
latexmk). This is that line wrapping in a .log file by pdflatex and lualatex
doesn't respect the character semantics. They wrap at a particular number of
bytes, which in a default installation is 79. You only definitely get utf-8
after undoing the line wrapping. In contrast, xelatex wraps at 79 code point
units.
Dealing with line wrapping is important to latexmk, so that it can extract
dependency information properly. Once I realized the different behavior of
xelatex, the true source of a user-reported bug in latexmk became clear.
Probably the best solution for latexmk is to turn off line wrapping by the
programs it invokes. But that may make things not so nice for apps that display
log files to the user. (TeXShop, TeXWorks, etc.)
John Collins
More information about the tex-live
mailing list.