[XeTeX] xetex doesn't recognize/replace all invalid utf8 bytes

Herbert Schulz herbs at wideopenwest.com
Wed Dec 30 14:17:20 CET 2009


On Dec 30, 2009, at 4:07 AM, Peter Dyballa wrote:

> 
> Am 30.12.2009 um 01:54 schrieb Herbert Schulz:
> 
>> Is there a ``common'' name for that encoding?
> 
> 
> It could be "CP1252" as Mac OS X provides for example these:
> 
> /usr/share/locale/ru_RU.CP1251
> /Applications/Adobe Reader.app/Contents/MacOS/Resource/TypeSupport/Unicode/Mappings/win/CP874.TXT
> /Applications/Adobe Reader.app/Contents/MacOS/Resource/TypeSupport/Unicode/Mappings/win/CP932.TXT
> /Applications/Adobe Reader.app/Contents/MacOS/Resource/TypeSupport/Unicode/Mappings/win/CP936.TXT
> /Applications/Adobe Reader.app/Contents/MacOS/Resource/TypeSupport/Unicode/Mappings/win/CP949.TXT
> /Applications/Adobe Reader.app/Contents/MacOS/Resource/TypeSupport/Unicode/Mappings/win/CP950.TXT
> /Applications/Adobe Reader.app/Contents/MacOS/Resource/TypeSupport/Unicode/Mappings/win/CP1250.TXT
> /Applications/Adobe Reader.app/Contents/MacOS/Resource/TypeSupport/Unicode/Mappings/win/CP1251.TXT
> /Applications/Adobe Reader.app/Contents/MacOS/Resource/TypeSupport/Unicode/Mappings/win/CP1252.TXT
> /Applications/Adobe Reader.app/Contents/MacOS/Resource/TypeSupport/Unicode/Mappings/win/CP1253.TXT
> /Applications/Adobe Reader.app/Contents/MacOS/Resource/TypeSupport/Unicode/Mappings/win/CP1254.TXT
> /Applications/Adobe Reader.app/Contents/MacOS/Resource/TypeSupport/Unicode/Mappings/win/CP1255.TXT
> /Applications/Adobe Reader.app/Contents/MacOS/Resource/TypeSupport/Unicode/Mappings/win/CP1256.TXT
> /Applications/Adobe Reader.app/Contents/MacOS/Resource/TypeSupport/Unicode/Mappings/win/CP1257.TXT
> /Applications/Adobe Reader.app/Contents/MacOS/Resource/TypeSupport/Unicode/Mappings/win/CP1258.TXT
> /Developer/SDKs/MacOSX10.5.sdk/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/rexml/encodings/CP-1252.rb
> 
> You could also invoke on the command line:
> 
> 	iconv -l | grep CP
> 
> The iconv utility is meant to convert file contents between many encodings. It lists, among others:
> 
> 	CP1252 MS-ANSI WINDOWS-1252
> 
> --
> Greetings
> 
>  Pete
> 
> Almost anything is easier to get into than out of.
> 				– Allen's Law
> 

Howdy,

Well, here's the complete list of encodings from the TeXShop Help Panel:

•  MacOSRoman
•  IsoLatin
•  IsoLatin2
•  IsoLatin5
•  IsoLatin9
•  IsoLatinGreek
•  Mac Central European Roman
•  MacJapanese
•  DOSJapanese
•  SJIS_X0213
•  EUC_JP
•  JISJapanese
•  MacKorean
•  UTF-8 Unicode
•  Standard Unicode
•  Mac Cyrillic
•  DOS Cyrillic
•  DOS Russian
•  WindowsCentralEurRoman
•  Windows Cyrillic
•  KOI8_R
•  Mac Chinese Traditional
•  Mac Chinese Simplified
•  DOS Chinese Traditional
•  DOS Chinese Simplified
•  GBK
•  GB 2312
•  GB 18030

Is that encoding anywhere in the list? If not make a request to Dick Koch to add it.

Good Luck,

Herb Schulz
(herbs at wideopenwest dot com)





More information about the XeTeX mailing list