[XeTeX] Whitespace in input
Keith J. Schultz
keithjschultz at web.de
Sat Nov 19 14:19:17 CET 2011
OUCH! I have been hit by a veteran truck drivers truck. ;-))
I concede!
I am curious if many still know what a XX-bit word is. Is that term even still used?
Turn Unicode needs to be clean up it has become to fragmented.
regards
Keith.
Am 19.11.2011 um 09:39 schrieb Philip TAYLOR:
>
>
> Keith J. Schultz wrote:
>
>> I do not think anybody disputes the fact that characters are not glyphs.
>>
>> The confusion arises that a character in CS is well defined and has a history.
>> To be more exact it is just one byte in size so that there can be only 256 characters.
>
> Sorry, Keith, this is patently untrue. Replace "is" by "was once" and
> you get a little closer to the truth, but you still completely ignore
> issues such as the difference between (say) EBCDIC and ASCII. CDC machines
> used a 60-bit word, and one character was six bits, not eight. And before
> the advent of the extended character set, a character consisted of seven
> bits plus a parity bit, thus yielding at most 128 characters of which
> 32 were reserved for control functions.
>
>> The average user considers a glyph to be the same as a "letter" and thereby a character.
>
> It is rarely safe to believe that one knows what the average user thinks ...
>
>> Now, in order to process the glyphs with a computer it must be decomposed back to unicode.
>
> But one rarely, if ever, "processes glyphs"; the glyphs are the end result,
> not the input. Glyph processing does become necessary in languages such
> as Arabic, where context has a major impact on the way in which the
> individual glyphs are presented, but in Western languages the nearest we
> get to "glyph processing" is in the formation of ligature digraphs.
>
>> How well this is done depends of the system its self. If the system is not fully unicode aware and
>> implements in properly then there will be problems. What adds to the complexity of the problem is that
>> not all fonts used for displaying unicode contain all code points, Thereby, creating your many to many
>> decomposition.
>>
>> As for getting junk when copying unicode, just copy between to text using different fonts, where one font does
>> not contain the glyph.
>>
>> The only true way to master this problem is if the computer world would go completely full unicode with
>> fonts support the full unicode code set!
>
> I personally hope that this does not happen, and that before then
> we have an "Omnicode consortium" to review the mistakes of Unicode
> and to address them in a future, more orthogonal, more consistent,
> specification.
>
> Philip Taylor
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
> http://tug.org/mailman/listinfo/xetex
More information about the XeTeX
mailing list