[XeTeX] a package for transition ryles based on unicode block (allowing amongst other things automatic font switching between languages)

Mon Oct 19 15:53:09 CEST 2009

On 19 Oct 2009, at 14:37, Michiel Kamermans wrote:

> Vafa Khalighi wrote:
>>
>> Thanks heaps for your quck answer.
>>
>> This is something I tried but still typesets the Persian sentense  
>> from left to Right instead Right to left.
> I shall have a look at this. Environments may need some protected to  
> prevent things from going screwy, but I also see the following:
>
> ...
> \setTransitionsForArabic{\setRTL\fontspec[Script=Arabic]{Arial}} 
> {\setLTR}
> \begin{document}
> این یک آزمایش است.
> ...
>
> yields a PDF with:
>
> نﯾا ﮏﯾ شﯾﺎﻣزآ .تﺳ
>
> The reason for this is that the space is a character from the Basic  
> Latin block.

I doubt that this is the issue, as (xe)tex doesn't treat <space> as a  
"character" at all, it is converted to inter-word glue. So its  
interchar class is never used in normal cases (unless you change its  
catcode, which would disrupt all kinds of other things as well).

However, you *will* see transitions at the beginning and end of each  
word, from class 255 ("boundary") to the class of the word's actual  
characters at the beginning, and from the actual characters to 255 at  
the end. If you're toggling direction on these transitions, then even  
though each word is RTL, the overall stream of words will be LTR.

For a better chance of this working, you should probably keep track of  
a "current direction", and only change it when actually encountering a  
character with the opposite directionality, not reset it on each word  
boundary.

> As already pointed out in the documentation, ucharclasses will  
> produce questionable results when using languages that overlap with  
> others in the document, such as Vietnamese, or in this case Arabic.
>
> In order for ucharclasses to be genuinely useful in these settings,  
> it may need some kind of command for temporarily reassigning overlap  
> punctuation to a specific class or informal group... but that's  
> going to need a bit more thinking.

It would probably be more useful to work in terms of *scripts* (see http://www.unicode.org/Public/UNIDATA/Scripts.txt) 
  than Unicode blocks; this is much closer to being a semantically  
meaningful categorization. And you'll notice that lots of punctuation,  
etc., is assigned to "Common" rather than to a specific script....

JK