[XeTeX] xetex and the unicode bidirectional algorithm.

Jonathan Kew jfkthame at googlemail.com
Thu Dec 5 11:41:38 CET 2013

On 4/12/13 13:24, C. Scott Ananian wrote:
> The goal is to match the Unicode bidi algorithm, because that is how the
> web page displays and thus how the original author saw the text as they
> wrote.

This would be a nice enhancement, but would require a significant amount 
of work (or in other words, it's not likely to get implemented quickly, 
if at all).

Currently, typesetting bidi text with xetex requires correct use of the 
TeX--XeT bidi commands (\beginR, \endR, \beginL, \endL) to mark up the 
text direction. These could be used directly, or via higher-level markup 
that's tagging script and language, but you definitely need them to be 
present in some way.

Sorry, that's not what you want to hear, but it's how things are. At 
this point, I think the most practical way forward in your situation is 
probably to implement this as part of whatever tool is taking the 
wikipedia content and converting it to (Xe)LaTeX markup - that tool 
could inspect the content of each element it's processing, and add any 
necessary direction controls for XeTeX.


> Guessing the proper language tag to use is likely infeasible;
> note that the example given contains titles in Turkish as well as
> English.  The safest option is probably to treat embedded LTR text in an
> RTL context as 'exotic' and not to attempt hyphenation.
> I've heard it said that LuaTeX has "better bidi support".  What does
> that mean, exactly? Should I be considering switching?
>    --scott
> On Dec 4, 2013 4:08 AM, "Keith J. Schultz" <schultzk at uni-trier.de
> <mailto:schultzk at uni-trier.de>> wrote:
>     Hi Scott,
>     Am 03.12.2013 um 19:42 schrieb C. Scott Ananian <cscott at cscott.net
>     <mailto:cscott at cscott.net>>:
>      >
>      > But in the XeLaTeX/polyglossia/bidi output, the "soft space" weak
>      > directionality of the Unicode BiDi algorithm doesn't seem to be
>      > honored (or implemented?) and so the English article titles appear
>      > with the individual words in RTL order, which is a mess.  Manually
>      > tagging the language of the article title is probably the Right
>     thing,
>      > but infeasible for the entire wikipedia.
>              Well, without proper tagging you can not expect any system to
>              work properly or as expected!
>              For most entries a simple script should do the trick to add the
>              language tags to the article titles.
>     Hope this helps
>              regards
>                      Keith.
>     --------------------------------------------------
>     Subscriptions, Archive, and List information, etc.:
>     http://tug.org/mailman/listinfo/xetex
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>    http://tug.org/mailman/listinfo/xetex

More information about the XeTeX mailing list