<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">Hi Scott,<div><br></div><div>We are talking Unicode here right! What is there to guess? </div><div><br></div><div>Then there is always the possibility of having the text tagged when written by the original</div><div>author. Of course, only when you can control his input tools.</div><div><br></div><div>Lua(La)TeX has other great feature. You have a complete programming language</div><div>you can use to maniplulate data/text before it is processed by TeX or even after it has been </div><div>processed by TeX. </div><div>This gives easier ways of manipulating and processing text than TeX has. </div><div><br></div><div>regards</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>Keith.</div><div><br><div><div>Am 04.12.2013 um 14:24 schrieb C. Scott Ananian <<a href="mailto:cscott@cscott.net">cscott@cscott.net</a>>:</div><br class="Apple-interchange-newline"><blockquote type="cite"><p dir="ltr">The goal is to match the Unicode bidi algorithm, because that is how the web page displays and thus how the original author saw the text as they wrote. Guessing the proper language tag to use is likely infeasible; note that the example given contains titles in Turkish as well as English. The safest option is probably to treat embedded LTR text in an RTL context as 'exotic' and not to attempt hyphenation.</p><p dir="ltr">I've heard it said that LuaTeX has "better bidi support". What does that mean, exactly? Should I be considering switching?<br>
--scott</p>
<div class="gmail_quote">On Dec 4, 2013 4:08 AM, "Keith J. Schultz" <<a href="mailto:schultzk@uni-trier.de">schultzk@uni-trier.de</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi Scott,<br>
<br>
Am 03.12.2013 um 19:42 schrieb C. Scott Ananian <<a href="mailto:cscott@cscott.net">cscott@cscott.net</a>>:<br>
<br>
><br>
> But in the XeLaTeX/polyglossia/bidi output, the "soft space" weak<br>
> directionality of the Unicode BiDi algorithm doesn't seem to be<br>
> honored (or implemented?) and so the English article titles appear<br>
> with the individual words in RTL order, which is a mess. Manually<br>
> tagging the language of the article title is probably the Right thing,<br>
> but infeasible for the entire wikipedia.<br>
Well, without proper tagging you can not expect any system to<br>
work properly or as expected!<br>
For most entries a simple script should do the trick to add the<br>
language tags to the article titles.<br>
<br>
Hope this helps<br>
regards<br>
Keith.<br>
<br>
<br>
--------------------------------------------------<br>
Subscriptions, Archive, and List information, etc.:<br>
<a href="http://tug.org/mailman/listinfo/xetex" target="_blank">http://tug.org/mailman/listinfo/xetex</a><br>
</blockquote></div>
<br><br>--------------------------------------------------<br>Subscriptions, Archive, and List information, etc.:<br> <a href="http://tug.org/mailman/listinfo/xetex">http://tug.org/mailman/listinfo/xetex</a><br></blockquote></div><br></div></body></html>