[XeTeX] Re: XeTeX & Unicode vs. standard LaTeX
Jonathan Kew
jonathan_kew at sil.org
Sun Oct 10 21:26:23 CEST 2004
Hi Zsolt,
Thanks for your message. A couple of comments below. (Copied to XeTeX
list with Zsolt's permission, as I think the response will be of wider
interest.)
On 9 Oct 2004, at 9:32 pm, Zsolt Kiraly wrote:
> Hi Jonathan,
>
> I saw on the mailing list that there is some discussion on whether
> XeTeX should be LaTeX compatible regarding curly quotes, dashes,
> apostrophes, etc. Some people would like complete compatibility, and
> others think that we would be better off writing our text in pure
> Unicode with Unicode quotes, Unicode dashes, and so on. But you know
> all of this.
>
> For me the problem of writing Unicode documents lies in the keyboard.
> The current Mac keyboards are not built to write Unicode curly quotes
> and dashes. It is inconvenient to look up the code table for every
> apostrophe and endash.
The Mac U.S. English keyboard (and other keyboards, I assume) has had
conventions for entering these characters for a long time: option-[ and
option-] for opening curly quotes, and shift-option for the closing
versions; and option-hyphen and shift-option-hyphen for en- and
em-dashes. But I'm sure many users are unaware of these. Programs like
MS Word tend to "auto-correct" simple ASCII typing with a "smart
quotes" feature, etc., and TeX users, of course, are familiar with its
ASCII-based conventions, which are often more convenient to type than
the modifier-key combinations used in the MacRoman layouts.
> Maybe the solution would be in the use of a preprocessor that
> converted standard LaTeX quotes and dashes, etc into their Unicode
> equivalents and gave its output to XeTeX to process. People who wanted
> LaTeX compatibility would be happy, and people who wanted straight
> Unicode would have the ability to turn off the preprocessor.
>
> The T1.enc file has a set of standard LaTeX ligatures to enforce,
> although the ' apostrophe would still need to be mapped to the curly
> apostrophe.
>
> All of this must be transparent to the user, and a simple option to
> the XeTeX executable should be enough to turn the preprocessor on or
> off. This way \include-ed files and BibTeX and index files would also
> be automatically preprocessed if the option is on.
>
> Do you think this would solve a lot of people's problems ? I'd be
> interested in any thoughts you might have on this subject.
I don't think a preprocessor is the right way to solve this. For one
thing, it would be impossible for a preprocessor (unless it included a
full TeX parser and macro system!) to know whether there might be
instances of "--", for example, that *shouldn't* be converted to
\char"2013. Would this be a problem in practice? Yes! Imagine
typesetting a document that includes fragments of C/C++ source code;
"--" is a common C operator.
These TeX conventions are actually implemented as ligatures, and the
right place to solve the problem is where ligatures are defined: at the
font level. It would be possible for AAT or OpenType fonts to include
ligature rules for these typical TeX conventions. (Note, incidentally,
that not all the standard TeX fonts implement the same set of
ligatures; there's no "--" ligature in cmtt, for example. This is also
a clue that a preprocessor, which would be unaware of fonts, is not the
answer.)
However, we obviously cannot expect mainstream font vendors to add
support for TeX's unique keying conventions to their font tables.
Therefore, I have just implemented a "font mapping" scheme (this was
first suggested on the XeTeX list by Ross Moore, IIRC), which allows an
arbitrary mapping of Unicode character sequences to be associated with
a particular font. So having defined a mapping "tex-text" that includes
entries such as:
U+002D U+002D > U+2013 ; endash
U+002D U+002D U+002D > U+2014 ; emdash
U+0060 U+0060 > U+201C ; opening double quote
; etc....
I can then load a font with a command like
\font\pal = "Palatino:mapping=tex-text" at 12pt
and whenever this font is used, XeTeX will pass the Unicode character
sequence to be typeset (at the lowest level, after all macro expansion,
etc.) through this mapping, and the standard TeX ligatures will work as
expected.
This was just implemented on Friday, and seems to be working well. It
will be present in the next release of XeTeX (along with that OpenType
ligature bug-fix, and perhaps another feature or two). Stay tuned! :-)
Jonathan
More information about the XeTeX
mailing list