[XeTeX] Should xelatex have its own kernel? (was: "Conflict between xunicode and fontspec?")
François Charette
firmicus at ankabut.net
Thu Feb 7 09:28:56 CET 2008
Dear all,
The above discussion (“Conflict between xunicode and fontspec?”) raised
wider-ranging issues about xelatex, so I decided to continue it in this
new thread.
> Actually there is an interesting issue here, and it can be connected
> with another recent thread, the one about soft hyphen: I think some of
> these characters should be handled on a lower level than
> language-specifics packages, probably in the format itself.
>
I agree here. Following the same thread, I was boldly thinking whether
we might consider the possibility of a format that is specific to
xelatex; that is, instead of creating xelatex.fmt directly from
latex.ltx, that we rename it to xelatex.ltx and make all the changes and
additions we thing should be available by default to xelatex users in
our own kernel, with the objective of enhancing it for our purposes,
without of course breaking compatibility with standard LaTeX. This would
consist in:
* removing obsolete code in the kernel and patching it with fixltx2e.sty,
* including the code from etex.sty and probably part of the code from
xltxtra.sty,
* and adding other enhancements as well (see further below).
This is a rather spontaneous idea, and I have not thought out its
practical implications, but I would be interested to know what you think.
> Such characters include the Unicode soft hyphen, as has already been
> discussed, and various space characters, as you and Adam pointed out,
> including the ISO 8859 no-break space (Unicode U+00A0 NO-BREAK SPACE).
> I'm convinced that handling the latter as equivalent to the usual TeX
> tie (~) is the right thing to do here, that is, making U+00A0 active and
> defining it as \penalty10000\space or something like that (just like ~
> in every TeX dialect). <..>
> But for some particular Unicode characters, it seems advisable and the most straightforward thing to do; I would say most characters with General Category C (U+00AD SOFT HYPHEN is one of them, having gc=Cf), as well as some space
> characters (space is anyway an active thing in TeX, if you think about it).
>
> Anyway, what I wanted to stress here was that it might be advisable to
> think in two steps: first, handle the characters according to their
> semantics as defined by Unicode (this would be XeLaTeX' job) and then,
> only, make some fine language-dependent typographical treatment
> (Polyglossia's job) <...>
>
> What do you think?
>
>
Again, that makes perfect sense.
Here is a list of Unicode "space and format characters" that could be
made active in the xelatex kernel (or in xltxtra.sty in case we decide
not to go along that road), and defined with appropriate macros:
U+00AD -> {\-}
U+00A0 -> {~}
U+2000 -> {\enspace}
U+2001 -> {\quad}
U+2002 -> {\enspace}
U+2003 -> {\quad}
U+2004 -> {\kern0.3333em} ? "three-per-em space"
U+2005 -> {\kern.25em} ? "four-per-em space"
U+2006 -> {\thinspace} ? "six-per-em space" = 1/6 em = \thinspace
U+2007 -> ? "figure space"
U+2008 -> ? "punctuation space"
U+2009 -> {\kern.2em} (or {\thinspace})
U+200A -> ? "hairspace" perhaps {\kern.05em} ?
U+200B -> {\hskip 0pt plus .4em}
U+2028 -> {\break}
U+2029 -> {\par}
U+202F -> {\unskip\thinspace} ? "narrow no break space"
U+205F -> {\kern0.2222em} ? "medium mathematical space"
U+2060 -> {\nobreak} ? "word joiner" (zero-width non-breaking space)
U+2061 -> ? invisible operator "function application"
U+2062 -> ? "invisible times"
U+2063 -> ? "invisible separator"
the above three are for Will! ;)
U+FEFF -> {\nobreak} ??
(I certainly forgot some…)
As you can see, in many cases I am unsure how to map the above Unicode
characters to TeX typographical commands. But I am sure many readers of
this list will know better!
There are also the format characters U+200E, U+200F, and U+202A–U+202E
that should perhaps be handled by my package bidi.sty, although I am not
sure if this is technically feasible or even appropriate.
Best,
François
More information about the XeTeX
mailing list