[OS X TeX] Input encoding question

Jonathan Kew jonathan at jfkew.plus.com
Fri Feb 20 11:11:00 CET 2009

On 20 Feb 2009, at 09:44, Jonathan Kew wrote:

> East Asian characters (the really big collection in terms of number  
> of different characters) are mostly represented by 2 bytes each in  
> UTF-8 .... but that's also true in pre-Unicode encodings.

Oops, a correction.... as Unicode experts out there are well aware,  
the majority of CJK characters require *three* bytes in UTF-8, and so  
it's true that typical Chinese or Japanese text will occupy 50% more  
space than in most legacy encodings. Nevertheless, it is still true  
that because of the quantity of ASCII markup that is usually present,  
the actual increase in real-world file sizes is normally much less  
than this.


More information about the macostex-archives mailing list