[XeTeX] xetex and the unicode bidirectional algorithm.

C. Scott Ananian cscott at cscott.net
Tue Dec 3 19:42:21 CET 2013


I'm using Xe(La)Tex for a rewrite of the PDF booklet backend for the
Wikimedia Foundation (Wikipedia).  It's going pretty well, and the
output looks good across a wide variety of Wikipedia's languages, but
I'm having an issue with mixed RTL/LTR texts.  I'm using the
polyglossia package (and hence bidi) and the basic RTL stuff works.
But citations on the articles tend to be a mix of RTL and LTR texts --
see
https://ar.wikipedia.org/wiki/%D9%84%D9%8A%D9%88%D9%86%D9%8A%D9%84_%D9%85%D9%8A%D8%B3%D9%8A#.D9.85.D8.B5.D8.A7.D8.AF.D8.B1
for instance.  Even in HTML you can notice some issues with trailing
parentheses being given the wrong directionality -- we only recently
landed support for the <bdo> and <bdi> tags, and it looks like we need
to apply them to the citation templates.

But in the XeLaTeX/polyglossia/bidi output, the "soft space" weak
directionality of the Unicode BiDi algorithm doesn't seem to be
honored (or implemented?) and so the English article titles appear
with the individual words in RTL order, which is a mess.  Manually
tagging the language of the article title is probably the Right thing,
but infeasible for the entire wikipedia.

Does XeLaTeX implement the Unicode BiDi algorithm?  If so, why isn't
it working (I can provide a TeX sample)?  If not, does anyone have any
suggestions for workarounds -- other than implementing the BiDi
algorithm myself and adding explicit \RL and \LR commands?
  --scott

ps. also, CJK languages aren't supported by polyglossia, and so
support for short CJK embeds inside articles (as well as
zh.wikipedia.org, ko.wikipedia.org, etc) is lagging.  Does anyone have
any advice on integrating the various CJK packages with polyglossia?

-- 
                         ( http://cscott.net/ )


More information about the XeTeX mailing list