[XeTeX] Re: New feature request for XeTeX
jonathan_kew at sil.org
Mon Jul 26 13:43:57 CEST 2004
Interesting idea; I'll think about it a bit. Some initial comments
> What I have in mind could be called a "font mapping",
> or "encoding mapping".
> This would be applied on output, whenever a particular font-variant
> is being used, both when obtaining the size of a sequence of
> letters/glyphs and in the final output.
(Just an aside: seems to me that this is closer to the input side of
things than the output.) In effect, you're asking for the ability to
apply a custom mapping from 7-bit codes to Unicode values, specified as
part of the \font declaration. Is that roughly the idea?
> What I want to do is to be able to continue to use the old TeX sources,
> but to end up with the Unicode code-points in the output
In principle, of course, you can do this by \catcoding the custom input
characters as active chars, and then \defining them as macros that
generate the appropriate Unicode codepoints. But that gets really
cumbersome if you need lots of them, and also need to use the same
(input) characters within control sequences, etc.
> Having a font-mapping available, there is an obvious way forward,
> that would alleviate completely the need for pre-processors, and
> be easier for a user to configure for his/her own needs.
> The \textipa command would cause the mapping to be applied to the
> result (*after* macro-expansions) of the usual processing of its
> Currently \textipa sets the font encoding to T3 for its processing,
> and switches font to have the correct metrics available.
Not having touched LaTeX in many years (I used 2.09 once upon a time),
I don't know how input encodings, font encodings, etc. are handled by
current LaTeX packages. What does "sets the font encoding to T3"
actually do, in TeX terms?
> With XeTeX, the change would be to an AAT font; but there needs
> to be an appropriate mapping either:
> (i) of 7-bit characters directly to Unicode points;
> (ii) of 7-bit characters to (La)TeX macros, and further processing
> to get the Unicode points.
> The latter would be more flexible, I think -- though perhaps harder
> to integrate seamlessly into TeX's workflow.
So for (i) do you envisage an extension to the \font command, perhaps
\font\tipalucida="Lucida Grande:mapping=tipa" at 10pt
which would load an external file such as "tipa.xmp" (the extension
.map is used for too many things already!), containing a mapping to be
applied to character codes when using this font.
Or should the mapping be defined entirely at the TeX source level?
Your (ii) seems to me to be a major extension, and probably hard to
design and do well. After all, we don't have a sequence of character
codes in a specific font until *after* macro processing; at that point,
you envisage replacing characters with macros and re-processing? Could
this process recurse indefinitely? It sounds like this could become an
extension on a similar scale to Omega's OTPs and all that stuff, and I
don't want to get into designing something of that scope.
If you need something more than a simple character code remapping for
certain characters, perhaps those instances could be handled as \active
characters in TeX, while the "mapping=...." option would allow you to
remap the majority of simple characters without having to \activate
just about everything. Reasonable compromise?
> As for mathematics, this would make it *much, much easier* to get
> consistent styles for mathematics in a document using an AAT text-font.
> This is because there are already code-points for slanted/italiced
> math-characters, math symbols, extension-brackets, fraktur, etc.
> Appropriate font-mappings for cmmi, cmsy, cmex would be easy to write.
> (Even some super/subscripts can be supported without changing
I suspect this will prove much harder than you think.
TeX relies heavily on special metrics in the TFM file to control math
typesetting, and when XeTeX loads and uses an AAT font there is no TFM
file involved! It measures runs of text by calling ATSUI, but that only
provides the basic width of a character sequence; it doesn't have
per-character height, depth, and italic correction. And there's no
place for XeTeX to get the "extensible recipe" used when constructing
large delimiters, etc.
So this font-mapping mechanism could give you easier access to simple
characters in Unicode fonts (while keeping source text in legacy
"hacked ASCII" encodings), but I'm doubtful that it would enable you to
replace cmex with a Unicode version. For that, I think we really need a
Unicode-based extension of the TFM file--which Omega has done, hasn't
it? But XeTeX doesn't currently read OFMs.
Given the limitations, would this still be a worthwhile extension?
More information about the XeTeX