[XeTeX] xunicode for maths

Jonathan Kew jonathan_kew at sil.org
Thu Feb 16 10:30:35 CET 2006


On 16 Feb 2006, at 12:05 am, Will Robertson wrote:

> Jonathan Kew wrote:
>>
>> One comment: please consider expressing the character codes you're  
>> accessing as true Unicode Scalar Value numbers, rather than pairs  
>> of surrogate codes.
>
> These USVs are just the UTF-32 representation of the character,  
> right? (omitting any initial zeros...)

Right.....only I think I'd express it the other way round. (Just to  
be picky!) Unicode Scalar Values are the real "character codes"  
assigned to the characters in the Unicode repertoire, and they're  
simply numbers (with no specified size or machine representation).

The various UTF-xx things are encoding forms that represent ways to  
represent those arbitrary-sized integers in code units of a given bit  
width. As the maximum USV is 0x10FFFF, it's possible to represent any  
USV directly in a single 32-bit integer, so that's what UTF-32 is: an  
encoding form that represents Unicode Scalar Values as 32-bit  
integers, using the obvious identity mapping.

UTF-16 and UTF-8 are encoding forms that represent the USVs as  
sequences of 16-bit or 8-bit code units, and so they have somewhat  
less straightforward mappings between USV and code units. But UTF-32,  
-16, and -8 are all merely ways of representing the same repertoire  
of USVs.

Clear as mud? ;-)

JK




More information about the XeTeX mailing list