[XeTeX] [tex-hyphen] Help with UTF-8 Language

Reinhard Kotucha reinhard.kotucha at web.de
Sun Oct 12 01:51:31 CEST 2014


On 2014-10-10 at 07:51:53 +0200, Werner LEMBERG wrote:

 > Unfortunately I don't have time to write a Perl or Python script for
 > you, but it should be straightforward to program a small filter that
 > 
 >  (a) converts from UTF-8 to UTF-16
 >  (b) converts from UTF-16 to the ad-hoc 8bit encoding by stripping
 >       off the higher byte

Hi Werner, you don't need Perl for (a).

  iconv -f UTF-8 -t UTF-16BE -o <outfile> <infile>

or, more verbose,

  iconv --from-code=UTF-8 --to-code=UTF-16BE --output=<outfile> <infile>

Converting from UTF-18 to UTF-8 is easier

  iconv -f UTF-16 -t UTF-8 -o <outfile> <infile>

because the byte order is determined by the BOM.


(b) is more difficult due to the endianmess.  Whether you have to
strip the lower or the higher byte depends on whether you converted to
UTF-16LE or UTF-16BE.


iconv is ubiquitous on Linux and maybe on other Unix systems too.
However, a few months ago I created binaries for Windows using the MXE
cross compiler.  Extract

  http://ms25.x64.me/w32/iconv/iconv.zip

in a directory which is in PATH.

Regards,
  Reinhard

-- 
------------------------------------------------------------------
Reinhard Kotucha                            Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover                    mailto:reinhard.kotucha at web.de
------------------------------------------------------------------


More information about the XeTeX mailing list