[XeTeX] xunicode for maths
Jonathan Kew
jonathan_kew at sil.org
Thu Feb 16 10:30:35 CET 2006
On 16 Feb 2006, at 12:05 am, Will Robertson wrote:
> Jonathan Kew wrote:
>>
>> One comment: please consider expressing the character codes you're
>> accessing as true Unicode Scalar Value numbers, rather than pairs
>> of surrogate codes.
>
> These USVs are just the UTF-32 representation of the character,
> right? (omitting any initial zeros...)
Right.....only I think I'd express it the other way round. (Just to
be picky!) Unicode Scalar Values are the real "character codes"
assigned to the characters in the Unicode repertoire, and they're
simply numbers (with no specified size or machine representation).
The various UTF-xx things are encoding forms that represent ways to
represent those arbitrary-sized integers in code units of a given bit
width. As the maximum USV is 0x10FFFF, it's possible to represent any
USV directly in a single 32-bit integer, so that's what UTF-32 is: an
encoding form that represents Unicode Scalar Values as 32-bit
integers, using the obvious identity mapping.
UTF-16 and UTF-8 are encoding forms that represent the USVs as
sequences of 16-bit or 8-bit code units, and so they have somewhat
less straightforward mappings between USV and code units. But UTF-32,
-16, and -8 are all merely ways of representing the same repertoire
of USVs.
Clear as mud? ;-)
JK
More information about the XeTeX
mailing list