[XeTeX] Whitespace in input

Keith J. Schultz keithjschultz at web.de
Sat Nov 19 14:19:17 CET 2011


OUCH! I have been hit by a veteran truck drivers truck. ;-))

I concede! 

I am curious if many still know what a XX-bit word is. Is that term even still used?

Turn Unicode needs to be clean up it has become to fragmented.

regards
	Keith.

Am 19.11.2011 um 09:39 schrieb Philip TAYLOR:

> 
> 
> Keith J. Schultz wrote:
> 
>> 	I do not think anybody disputes the fact that characters are not glyphs.
>> 
>> 	The confusion arises that a character in CS is well defined and has a history.
>> 	To be more exact it is just one byte in size so that there can be only 256 characters.
> 
> Sorry, Keith, this is patently untrue.  Replace "is" by "was once" and
> you get a little closer to the truth, but you still completely ignore
> issues such as the difference between (say) EBCDIC and ASCII.  CDC machines
> used a 60-bit word, and one character was six bits, not eight.  And before
> the advent of the extended character set, a character consisted of seven
> bits plus a parity bit, thus yielding at most 128 characters of which
> 32 were reserved for control functions.
> 	
>> 	The average user considers a glyph to be the same as a "letter" and thereby a character.
> 
> It is rarely safe to believe that one knows what the average user thinks ...
> 
>> 	Now, in order to process the glyphs with a computer it must be decomposed back to unicode.
> 
> But one rarely, if ever, "processes glyphs"; the glyphs are the end result,
> not the input.  Glyph processing does become necessary in languages such
> as Arabic, where context has a major impact on the way in which the
> individual glyphs are presented, but in Western languages the nearest we
> get to "glyph processing" is in the formation of ligature digraphs.
> 
>> 	How well this is done depends of the system its self. If the system is not fully unicode aware and
>> 	implements in properly then there will be problems. What adds to the complexity of the problem is that
>> 	not all fonts used for displaying unicode contain all code points, Thereby, creating your many to many
>> 	decomposition.
>> 
>> 	As for getting junk when copying unicode, just copy between to text using different fonts, where one font does
>> 	not contain the glyph.
>> 
>> 	The only true way to master this problem is if the computer world would go completely full unicode with
>> 	fonts support the full unicode code set!
> 
> I personally hope that this does not happen, and that before then
> we have an "Omnicode consortium" to review the mistakes of Unicode
> and to address them in a future, more orthogonal, more consistent,
> specification.
> 
> Philip Taylor
> 
> 
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
> http://tug.org/mailman/listinfo/xetex




More information about the XeTeX mailing list