[XeTeX] Re: XeTeX & Unicode vs. standard LaTeX

christopher ciotti chris_ciotti at yahoo.com
Sun Oct 10 21:52:04 CEST 2004


On Oct 10, 2004, at 3:26 PM, Jonathan Kew wrote:

> Hi Zsolt,
>
> Thanks for your message. A couple of comments below. (Copied to XeTeX 
> list with Zsolt's permission, as I think the response will be of wider 
> interest.)
>

I have been wondering about putting together a unicode version of 
textcomp.sty.  I have a rudimentary collection of \[re]newcommand 
statements in a sty file to make common text stuff easy after the other 
day when I ran into trouble with quotes and the $ sign.  I'm not really 
sure about how to properly implement this stuff but if anyone is 
interested in what I have I'll post it.  At the very least, it might 
save some typing.


> On 9 Oct 2004, at 9:32 pm, Zsolt Kiraly wrote:
>
>> Hi Jonathan,
>>
>> I saw on the mailing list that there is some discussion on whether 
>> XeTeX should be LaTeX compatible regarding curly quotes, dashes, 
>> apostrophes, etc. Some people would like complete compatibility, and 
>> others think that we would be better off writing our text in pure 
>> Unicode with Unicode quotes, Unicode dashes, and so on. But you know 
>> all of this.
>>
>> For me the problem of writing Unicode documents lies in the keyboard. 
>> The current Mac keyboards are not built to write Unicode curly quotes 
>> and dashes. It is inconvenient to look up the code table for every 
>> apostrophe and endash.
>
> The Mac U.S. English keyboard (and other keyboards, I assume) has had 
> conventions for entering these characters for a long time: option-[ 
> and option-] for opening curly quotes, and shift-option for the 
> closing versions; and option-hyphen and shift-option-hyphen for en- 
> and em-dashes. But I'm sure many users are unaware of these. Programs 
> like MS Word tend to "auto-correct" simple ASCII typing with a "smart 
> quotes" feature, etc., and TeX users, of course, are familiar with its 
> ASCII-based conventions, which are often more convenient to type than 
> the modifier-key combinations used in the MacRoman layouts.
>
>>  Maybe the solution would be in the use of a preprocessor that 
>> converted standard LaTeX quotes and dashes, etc into their Unicode 
>> equivalents and gave its output to XeTeX to process. People who 
>> wanted LaTeX compatibility would be happy, and people who wanted 
>> straight Unicode would have the ability to turn off the preprocessor.
>>
>> The T1.enc file has a set of standard LaTeX ligatures to enforce, 
>> although the ' apostrophe would still need to be mapped to the curly 
>> apostrophe.
>>
>> All of this must be transparent to the user, and a simple option to 
>> the XeTeX executable should be enough to turn the preprocessor on or 
>> off. This way \include-ed files and BibTeX and index files would also 
>> be automatically preprocessed if the option is on.
>>
>> Do you think this would solve a lot of people's problems ? I'd be 
>> interested in any thoughts you might have on this subject.
>
> I don't think a preprocessor is the right way to solve this. For one 
> thing, it would be impossible for a preprocessor (unless it included a 
> full TeX parser and macro system!) to know whether there might be 
> instances of "--", for example, that *shouldn't* be converted to 
> \char"2013. Would this be a problem in practice? Yes! Imagine 
> typesetting a document that includes fragments of C/C++ source code; 
> "--" is a common C operator.
>
> These TeX conventions are actually implemented as ligatures, and the 
> right place to solve the problem is where ligatures are defined: at 
> the font level. It would be possible for AAT or OpenType fonts to 
> include ligature rules for these typical TeX conventions. (Note, 
> incidentally, that not all the standard TeX fonts implement the same 
> set of ligatures; there's no "--" ligature in cmtt, for example. This 
> is also a clue that a preprocessor, which would be unaware of fonts, 
> is not the answer.)
>
> However, we obviously cannot expect mainstream font vendors to add 
> support for TeX's unique keying conventions to their font tables. 
> Therefore, I have just implemented a "font mapping" scheme (this was 
> first suggested on the XeTeX list by Ross Moore, IIRC), which allows 
> an arbitrary mapping of Unicode character sequences to be associated 
> with a particular font. So having defined a mapping "tex-text" that 
> includes entries such as:
>
>     U+002D U+002D         >  U+2013 ; endash
>     U+002D U+002D U+002D  >  U+2014 ; emdash
>     U+0060 U+0060         >  U+201C ; opening double quote
>     ; etc....
>
> I can then load a font with a command like
>
>     \font\pal = "Palatino:mapping=tex-text" at 12pt
>
> and whenever this font is used, XeTeX will pass the Unicode character 
> sequence to be typeset (at the lowest level, after all macro 
> expansion, etc.) through this mapping, and the standard TeX ligatures 
> will work as expected.
>
> This was just implemented on Friday, and seems to be working well. It 
> will be present in the next release of XeTeX (along with that OpenType 
> ligature bug-fix, and perhaps another feature or two). Stay tuned! :-)
>
> Jonathan
>
> _______________________________________________
> XeTeX mailing list
> postmaster at tug.org
> http://tug.org/mailman/listinfo/xetex
>
>
-- 
chris ciotti <chris_ciotti at yahoo.com>
http://www.keyserver.net/en/
Key ID: 0x0BD2B97A
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 172 bytes
Desc: not available
Url : http://tug.org/pipermail/xetex/attachments/20041010/9d5a38a8/attachment.bin


More information about the XeTeX mailing list