[OS X TeX] Input encoding question
Jonathan Kew
jonathan at jfkew.plus.com
Fri Feb 20 11:11:00 CET 2009
On 20 Feb 2009, at 09:44, Jonathan Kew wrote:
> East Asian characters (the really big collection in terms of number
> of different characters) are mostly represented by 2 bytes each in
> UTF-8 .... but that's also true in pre-Unicode encodings.
Oops, a correction.... as Unicode experts out there are well aware,
the majority of CJK characters require *three* bytes in UTF-8, and so
it's true that typical Chinese or Japanese text will occupy 50% more
space than in most legacy encodings. Nevertheless, it is still true
that because of the quantity of ASCII markup that is usually present,
the actual increase in real-world file sizes is normally much less
than this.
JK
More information about the macostex-archives
mailing list