[XeTeX] XeTeX in lshort

Mon Sep 27 02:04:12 CEST 2010

Am 27.09.2010 um 01:16 schrieb Mojca Miklavec:

> Abstract: lshort needs a chapter/section about Unicode on its own.
>> From what I experience here, a lot of TeX users are so brain-washed to
> use \v{c} and alike

Well: this is the official way to enter the character č, as described in lshort.pdf, p. 24.
In the nineties when I started writing a few puny HTML documents, I learned that I had to type &auml; to get ä. It never came to me that this was silly: why should I type a six-character string instead of a character that I could enter with a single keystroke? It was just a technical requirement. Most users still think like that: they have learned that “special characters” cause problems and should thus be avoided. Even the wording is problematic: why are ä, č, π, ≤, 儹 and ص special, but A, ` and # not? Unicode should have liberated us from such thinking.

> UTF-8 can/should just as well be
> used in pdfLaTeX.

Again lshort.pdf disagrees (see p. 26): you are supposed to use Mac Roman for Mac, Latin-1 for Unix (what is that? can’t be Linux since that is not Unix, do they mean OS X which is Unix? but what about Mac Roman, then?), Windows-1252 for Windows and CP-850 for DOS and OS/2 (of course highly relevant in 2010). There are also suggestions for Cyrillic documents (apparently Western European Latin and Cyrillic are the only scripts in the world). If you want more, lshort.pdf encourages you to use the unmaintained ucs package instead of the maintained utf8enc.def.
What I want to say: before one should think about including XeTeX in lshort (and further redefine the meaning of the word “short”), one should perhaps thoroughly clean up the existing document. Probably it could be made much more concise by throwing out the legacy baggage.

> Second: I'm not sure if a special section is needed
> to mention all the zillions of methods to enter Unicode characters,
> though that could be covered in a separate document.
> 
> On Sun, Sep 26, 2010 at 22:08, Tobias Schoel wrote:
>> 
>> Agreed. But one should at least give a reference link to information about
>> how to input Unicode in Windoof, OS X and Linux respectively. There is no
>> advantage in telling the people in lshort: "There is also XeLaTeX, which
>> lets you input everything in unicode and use any OpenType or TrueType font
>> on your system.", if you don't tell them how to do this or at least where to
>> find information about it. These people will simply say: "What the heck is
>> Unicode again? I simply press the keys on my keyboard."
>> 
>> The "usual" Windows user has no to hardly any knowledge of input methods
>> other than what he is used to. Consequently he won't see any advantage of
>> _being allowed to_ enter any unicode character, if he isn't _able to_.
> 
> Wait. Unicode is not only about US users who need to typeset an accent
> every now and then, but also about users who know how to use their
> local keyboard, but keep using cp-1250, cp-1252, ...
> 
> Maybe I'm wrong, but (talking about a couple of years ago) I found it
> much more difficult to actually *save* the file in proper encoding
> (with editors defaulting to some random local encoding) than to
> properly enter the character that I needed for whatever reason.
> 
> Our keyboard on windows is capable of producing most accented latin
> letters, but I don't think that anyone would want to explain how to
> use every single keyboard in that short section.
> 
> There are two kinds of non-ascii character usage: writing in native
> language and writing foreign names. The second is a bit exceptional
> and can often be dealt with copy-paste\footnote{I maintain the package
> with hyphenation patterns that needs a lot of different Unicode
> characters for various reasons and I have created my own keyboard
> layout, but I don't have the slightest idea how to input any accented
> character apart from those used in my language + German; neither in my
> OS nor in my text editor; copy-paste fully serves the purpose; on
> windows we had a labeled keyboard with dead keys, but now I do survive
> without}. Those who need to type characters from local alphabet
> usually know how to do that. After all, they need to use other
> applications and their keyboard is usually configured properly.
> 
> But all these users don't necessary know anything about file encodings
> (and why should they?). They just use whatever encoding works for
> them. I kept using 8bit encoding for a long time after I already knew
> that UTF-8 was a better choice just because editors had enormous
> problems with Unicode.
> 
> My favorite TeX editor (WinEDT) didn't support Unicode at all some
> years ago, even (g)VIM *still* nowadays simply defaults to cp-1250 and
> it is pretty non-straightforward for a new user to convince it to use
> UTF-8. My teacher who uses Mac opened a document in UTF-8, edited a
> whole bunch of stuff and stored it under MacRoman encoding,
> irreversibly (even though Macs are well known for their
> "Unicode"-awareness and TeXShop is considered to be a solid editor,
> but well ... the defaults are still 8bit).

Which is a pity indeed. Everybody, including Microsoft, encourages you to use Unicode encodings, but few seem to have enough courage to make it the default.