[XeTeX] xetex and the unicode bidirectional algorithm.

Zdenek Wagner zdenek.wagner at gmail.com
Mon Dec 9 17:15:20 CET 2013

2013/12/9  <mskala at ansuz.sooke.bc.ca>:
> On Mon, 9 Dec 2013, Khaled Hosny wrote:
>> >    U+E0001 U+E0065 U+E006E U+0073 U+0061 U+006E U+0067
>> And it is a kind of tagging, so beyond the scope of identifying the
>> language of *untagged* text (which is the claim that spurred all this
>> discussion).
> The claim was "A properly encoded utf-8 string should contain everything
> you need!".  If you forbid using Unicode tag characters, then you're
> saying "It is impossible to encode language in Unicode when you're not
> allowed to use the features designed for that purpose," which is not
> an interesting statement.
> Yes, of course some kind of tagging is needed.  Keith seems to think that
> the tagging will magically come from "proper" UTF-8, and of course he's
> wrong.  I think language tagging would be possible in pure Unicode, as the
> string above demonstrates, but that's not a good way to do it.  The really
> original question had to do with RTL versus LTR detection, not language
> detection, and that's a different issue.
> Unicode specifies a way to detect RTL versus LTR, such that in many cases
> it doesn't require tagging.  Unicode's way of doing it may or may not be a
> good one, but we cannot reasonably pretend that it doesn't exist.  The
> Unicode bidi algorithm does exist.  XeTeX does not implement the Unicode
> bidi algorithm.  The interesting remaining question is whether XeTeX
> should implement it.  I tend to think not - because if we implement it,
> people will blame us for its failings.  It'd also be a lot of work, break
> compatibility with the rest of the TeX world, STILL require tagging in
> many cases, and so on.
A bit off topic, dou you know a good Linux text editor woth properly
implemented bidi algorithm so that I could type multilingual texts?
Evne the combination of Urdu and TeX macros is a pain, it is not easy
to type
\textbf{میں نے
 کو سب کچھ کیا۔}
I am not able to type it on a single line, gedit, kate and even gmail
and facebook get confused and create garbage if I mix LTR and RTL
scripts.. I can only use a commercial XML editor that allows me to
combine text in a latin script with texts in Hindi and Urdu.

Zdeněk Wagner

