[XeTeX] Should xelatex have its own kernel? (was: "Conflict between xunicode and fontspec?")

François Charette firmicus at ankabut.net
Thu Feb 7 09:28:56 CET 2008


Dear all,

The above discussion (“Conflict between xunicode and fontspec?”) raised 
wider-ranging issues about xelatex, so I decided to continue it in this 
new thread.

>   Actually there is an interesting issue here, and it can be connected
> with another recent thread, the one about soft hyphen: I think some of
> these characters should be handled on a lower level than
> language-specifics packages, probably in the format itself.
>   
I agree here. Following the same thread, I was boldly thinking whether 
we might consider the possibility of a format that is specific to 
xelatex; that is, instead of creating xelatex.fmt directly from 
latex.ltx, that we rename it to xelatex.ltx and make all the changes and 
additions we thing should be available by default to xelatex users in 
our own kernel, with the objective of enhancing it for our purposes, 
without of course breaking compatibility with standard LaTeX. This would 
consist in:

* removing obsolete code in the kernel and patching it with fixltx2e.sty,

* including the code from etex.sty and probably part of the code from 
xltxtra.sty,

* and adding other enhancements as well (see further below).

This is a rather spontaneous idea, and I have not thought out its 
practical implications, but I would be interested to know what you think.


> Such characters include the Unicode soft hyphen, as has already been
> discussed, and various space characters, as you and Adam pointed out,
> including the ISO 8859 no-break space (Unicode U+00A0 NO-BREAK SPACE).
> I'm convinced that handling the latter as equivalent to the usual TeX
> tie (~) is the right thing to do here, that is, making U+00A0 active and
> defining it as \penalty10000\space or something like that (just like ~
> in every TeX dialect).  <..> 
> But for some particular Unicode characters, it seems advisable and the most straightforward thing to do; I would say most characters with General Category C (U+00AD SOFT HYPHEN is one of them, having gc=Cf), as well as some space
> characters (space is anyway an active thing in TeX, if you think about it).
>
>   Anyway, what I wanted to stress here was that it might be advisable to
> think in two steps: first, handle the characters according to their
> semantics as defined by Unicode (this would be XeLaTeX' job) and then,
> only, make some fine language-dependent typographical treatment
> (Polyglossia's job) <...>
>
>   What do you think?
>
>   

Again, that makes perfect sense.

Here is a list of Unicode "space and format characters" that could be 
made active in the xelatex kernel (or in xltxtra.sty in case we decide 
not to go along that road), and defined with appropriate macros:

U+00AD -> {\-}
U+00A0 -> {~}
U+2000 -> {\enspace}
U+2001 -> {\quad}
U+2002 -> {\enspace}
U+2003 -> {\quad}
U+2004 -> {\kern0.3333em} ? "three-per-em space"
U+2005 -> {\kern.25em} ? "four-per-em space"
U+2006 -> {\thinspace} ? "six-per-em space" = 1/6 em = \thinspace
U+2007 -> ? "figure space"
U+2008 -> ? "punctuation space"
U+2009 -> {\kern.2em} (or {\thinspace})
U+200A -> ? "hairspace" perhaps {\kern.05em} ?
U+200B -> {\hskip 0pt plus .4em}
U+2028 -> {\break}
U+2029 -> {\par}
U+202F -> {\unskip\thinspace} ? "narrow no break space"
U+205F -> {\kern0.2222em} ? "medium mathematical space"
U+2060 -> {\nobreak} ?  "word joiner" (zero-width non-breaking space)
U+2061 -> ? invisible operator "function application"
U+2062 -> ? "invisible times"
U+2063 -> ? "invisible separator"
                  the above three are for Will! ;)
U+FEFF -> {\nobreak} ??

(I certainly forgot some…)

As you can see, in many cases I am unsure how to map the above Unicode 
characters to TeX typographical commands. But I am sure many readers of 
this list will know better!

There are also the format characters U+200E, U+200F, and U+202A–U+202E 
that should perhaps be handled by my package bidi.sty, although I am not 
sure if this is technically feasible or even appropriate.

Best,
François





More information about the XeTeX mailing list