[l2h] current state of unicode support

Ross Moore ross at ics.mq.edu.au
Sun Jul 6 23:53:07 CEST 2003

On Sun, 6 Jul 2003, [iso-8859-2] Janusz S. Bieñ wrote:

> Can latex2html accept some form of unicode (utf8 or utf16) as input?

Yes, and no.

The `no' means that there is nothing that is specifically designed
to support this kind of input.

The `yes' means that the effect of supplying UTF8 *should* be that any
bytes (nibbles?) in the upper range go through unchanged.

If this does not happen by default, then it is because the default
charsets assume that upper-8-bit characters have a special meaning that
can be translated into alternative TeX sequences, and perhaps require
an image to be created.

To stop this you may need to specify on the commandline something like:

  latex2html -html_version 4.0,unicode  ...other-options...  <filename>
  latex2html -html_version 4.0,unicode,utf8  ......
or even
  latex2html -html_version 4.0,unicode,unicode  ......

Basically, the problem will be that you do *not* want LaTeX2HTML
to assign special meaning to upper-8-bit codes and translate them
into something else.

Another way to prevent conversion of characters is to explicitly
kill the subroutine where such conversion occurs:

 sub convert_iso_latin_chars { $_[1] }

Put the above line into your .latex2html-init  file,
or other init-file for your jobs.

In short, it shouldn't be too hard to make LaTeX2HTML do what you
want, if it doesn't do so already.

Hope this helps,

	Ross Moore

> Regards
> Janusz
> --
>                      ,
> dr hab. Janusz S. Bien, prof. UW
> Prof. Janusz S. Bien, Warsaw Uniwersity
> jsbien at mimuw.edu.pl, jsbien at uw.edu.pl
> http://www.orient.uw.edu.pl/~jsbien/
> http://www.mimuw.edu.pl/~jsbien/
> _______________________________________________
> latex2html mailing list
> latex2html at tug.org
> http://tug.org/mailman/listinfo/latex2html

More information about the latex2html mailing list