[tex-live] xindy and folders with non ascii chars
Richard M Kreuter
kreuter at progn.net
Sun Sep 23 23:02:41 CEST 2018
Bruno Haible writes:
> > Karl Berry writes in
> > <https://www.tug.org/pipermail/tex-live/2018-September/042413.html>:
> > > Why not
> > > #define VALID_FILENAME_CHAR (1)
> > > ? What is gained by all these conditions?
> >
> > When the user enters an invalid file name,
> > 1. clisp signals an error before the file name hits the file system,
> > namely already when the Lisp pathname gets constructed,
> > 2. the error message indicates the cause (remember that errors on
> > a file system can be caused by invalid file names, permission
> > problems, or even temporary issues like disk-full problems).
>
> And 3. On some systems, really erratic things happen when you pass
> file names with invalid bytes to the operating system.
(Not so much for Bruno, but as context for others reading this who can
be expected to be unfamiliar with the Common Lisp language or the Clisp
implementation...)
The Common Lisp language standard requires that when a file operation
receives a string argument, the file operation is to implicitly parse
the string and conditionally augment the parse with information that
might be construed as ``missing'' (for example, by appending an
extension if one is missing, say). This behavior bears a sort of family
resemblance to TeX82's filename handling, as Common Lisp's ancestor
languages also evolved on PDP-10 systems. For example, the parsing is
loosely analogous to scan_file_name in section 526 of TeX82; and the
augmenting is somewhat like a generalization of both pack_buffered_name
in section 523 and pack_job_name in section 529.
Additionally, the Common Lisp language standard allows the
implementation to detect invalid file specification syntax at its
discretion; that's what Clisp is up to here.
Anyway, under ordinary circumstances, the consequences of the parsing
and augmenting are effectively null. However, since most modern
programming languages simply pass strings to system calls without any
parsing or augmentation (albeit, for some languages, with implicit
encoding to code points), the fact that Common Lisp is required to parse
and permitted to error during the parse might be considered surprising.
Additional file naming notes that could trip up xindy users on Clisp:
1. [Probably relevant only on Unix.] To my knowledge, Clisp's file
handling offers no means to address any file or directory using a
specification that contains either a question mark or asterisk. There
can be some workarounds:
1a. If the offending character occurs in a directory, change to that
directory before starting Clisp, and address the file using a relative
file specification that omits the directory.
1b. Create another name for the file, either by linking or renaming the
file or offending directory. (To my knowledge, it's impossible to do
this from within Clisp itself.)
2. When Clisp's file specification parser encounters a "dotdot"
directory, it elides the dotdot and the directory level preceding it,
e.g., the string
"/home/me/foo/../bar"
parses to an object that denotes the same as
"/home/me/bar"
This parsing behavior is documented in Clisp's manual, and so is
presumably deliberate; however it has the consequence that in case foo
is a symbolic link to some directory other than an immediate
subdirectory of /home/me, the parse will denote a different pathname
than the original string does under Unix or Windows pathname resolution
rules.
There are some workarounds for this, too:
2a. Change to the desired directory before starting Clisp, and address
files using only relative specifications that omit the directory.
2b. If it's necessary to use a specification that includes a directory,
figure out a name for the directory that does not include dotdot. One
way to do that on Clisp is to use ext:cd repeatedly to resolve the
directory part of a string prior to passing the string to any standard
Lisp routines, but that's kind of grisly.
Regards,
Richard
More information about the tex-live
mailing list