[XeTeX] xetex and the unicode bidirectional algorithm.

Khaled Hosny khaledhosny at eglug.org
Wed Dec 11 03:36:31 CET 2013


On Tue, Dec 10, 2013 at 11:11:27AM -0500, C. Scott Ananian wrote:
> On Tue, Dec 10, 2013 at 6:09 AM, Zdenek Wagner <zdenek.wagner at gmail.com> wrote:
> > 2013/12/10 Keith J. Schultz <keithjschultz at web.de>:
> >> I will repeat I do not know Vietnamese so I can not give you
> [...]
> >> Now, if "sang" is true Vietnamese and not a latinized form stand corrected! Though I have
> [...]
> > Yes, it is true Vietnamese word. I do not know Vietnamese, I could
> 
> https://www.google.com/search?q=sang+site%3Avi.wikipedia.org
> 
> ..which is indeed the issue I am attempting to deal with (trying to
> put the discussion back on track) -- a bunch of user authored content
> which looks correct to a native speaker when using the unicode bidi
> algorithm (implemented in the browser).  Language tags are only
> applied sporadically when needed to correct some obvious issue --
> although the future Visual Editor project at wikimedia hopes to make
> language tagging a more integrated part of the editing process.
> 
> Language tagging uses the HTML <span lang="...." dir="...."> standard.
>  Directionality tagging uses <bdo> and <bdi> where necessary.  But
> again, the point of the bidi algorithm is to avoid the necessity of
> manual tagging in many cases.
> 
> Ultimately, wikipedias goal is to allow the largest number of
> individual authors the ability to create encyclopedic content in their
> language as easily as possible.  Our greatest challenge is the "as
> easily as possible" part.  We can't impose language tagging as a
> barrier to entry, when it is not necessary for the author's text to be
> readable and useful to the public.

There is a big difference between (barely) readable text and
typographically correct one, if your goal is only the former, this
language tagging can be skipped (and you can forget about hyphenation,
too, except for the main document language which is, hopefully, already
known).

This leaves you with the BiDi algorithm, for which there exists many
implementations that you might be able to use while processing your text
before generating TeX files. There even exists a TeX pre-processor that
can apply BiDi algorithm to TeX documents, that you might be able to use
or adapt (I never used it myself, and it was written for e-TeX but XeTeX
RTL model is essentially the same, so it should work in theory).

Regards,
Khaled


More information about the XeTeX mailing list