[XeTeX] default char classes
Barry MacKichan
barry at mackichan.com
Wed Mar 12 14:31:02 CET 2008
Jonathan, you have convinced me that language markup is needed.
Actually, with our mostly-WYSIWYG front end, you have to specify RTL
when appropriate in order to keep the cursor from jumping every time you
type a space -- it gets the direction from the font but then thinks it
has changed when it sees the space.
What I am getting out of this discussion is that the user should not
think that he is specifying a font with a tag -- with many Unicode fonts
this is unnecessary -- but he is specifying a language. And the language
determines much more than the font ...
I am curious about Will's question. Are there efficiency concerns in
defining lots of large token classes?
--Barry
> Message: 1
> Date: Sun, 9 Mar 2008 16:07:59 +0000
> From: Jonathan Kew <jonathan_kew at sil.org>
> Subject: Re: [XeTeX] default char classes
> To: barry.mackichan at mackichan.com, Unicode-based TeX for Mac OS X and
> other platforms <xetex at tug.org>
> Message-ID: <320757AA-4287-4530-BDE5-AD6E330BD57E at sil.org>
> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
>
> On 9 Mar 2008, at 3:18 pm, Barry MacKichan wrote:
>
>
>> Yes, that is how we do it now.
>>
>> I don't actually write multilingual documents myself, but we sell
>> software (Scientific WorkPlace, etc.) that does, and so we are
>> looking for ways to make things simpler for our customers.
>>
>> The main thing I'm after is to reinforce the concept in LaTeX of
>> separating content and form. The choice of a font for a particular
>> range of unicode characters is strictly a matter of form, yet the
>> author has to do different things in his document, depending on his
>> choice of fonts.
>>
>> 1. If he uses a font like Minion Pro, which contains Hebrew
>> characters, he needs to do nothing.
>>
>
> He still needs to get \beginR....\endR (or something higher-level
> that resolves to this) around the Hebrew text somehow, doesn't he?
> That doesn't happen automatically.
>
> Now someone will no doubt tell me that it should! Perhaps; but again,
> there's a limit to what can be done automatically. Given source text
> that contains
>
> latin latin HEBREW HEBREW latin latin HEBREW HEBREW latin latin.
>
> do we have a Latin-script sentence containing two separate Hebrew
> phrases, or is that a single Hebrew phrase that itself contains an
> embedded Latin quote? There's no way to know without some kind of
> markup or higher-level information, and it matters for layout. In
> other words, there's a crucial difference between these two:
>
> latin latin \beginR HEBREW HEBREW \endR latin latin \beginR
> HEBREW HEBREW \endR latin latin.
>
> latin latin \beginR HEBREW HEBREW \beginL latin latin \endL
> HEBREW HEBREW \endR latin latin.
>
> and only the author can tell us -- via markup -- which is intended.
>
> Or to take a "simpler" example, if our source text is
>
> latin latin HEBREW HEBREW? latin latin.
>
> are we looking at a single Latin-script sentence that contains a
> Hebrew quote that ends with a question mark, or are we looking at a
> Latin question (containing a couple of Hebrew words), and then a
> second Latin sentence? The answer to this will determine where the
> question mark appears in the reordered text -- is it part of the
> Hebrew inclusion (in which case it appears to the left), or part of
> the surrounding Latin script (and appears to the right)?
>
> JK
>
>
>
More information about the XeTeX
mailing list