[l2h] Re: LaTeX2HTML 2002 (1.67): latin2 option does not work?

Ross Moore ross@ics.mq.edu.au
Fri, 21 Jun 2002 00:26:50 +1000 (EST)

> Thanks for quick answer.

You're welcome.

> On Thu, 20 Jun 2002  Ross Moore <ross@ics.mq.edu.au> wrote:
> > 
> > to have latin2 as *both* the input encoding and the output encoding.
> > 
> Isn't this rather confusing? Where is it documented?

The topic is discussed in the LaTeX Web Companion;  Addison-Wesley.

The problem is that input-encoding and output-encoding can be
two quite different things.
> > 
> > I suspect that your old machine had some special settings within its
> >  l2hconf.pm  file, which changed the input-encoding to  latin2 .
> > In this case, a single 'latin2' on the command-line might set the
> > output encoding.
> > Alternatively, it may have set both encodings with  l2hconf.pm 
> > or within a local (or site-wide) initialisation file (.latex2html-init) .
> With some effort I am able to reproduce the old configuration, but I
> definitely haven't done anything special in configuration files.
> I think that the earlier versions (at least some of them) simply
> recognized the option of `inputenc' package. The option is now ignored
> - is it intentional?

The original HTML specs gave  latin1  and  Unicode as the expected
output charsets. The reality of the web is different.

With LaTeX2HTML, an input-encoding of latin1 will be assumed,
and characters not in this set are converted to images.

Furthermore, all accent-characters that are actually in  latin1 
are translated to &#<num>; entities.

If you don't want &#<num>; entities, then you need to say  latin1 
again, to specify that this is what you want.

If you specify just  latin2  then this is taken to be the *input*
encoding, but the output charset is still  latin1 , as per the HTML specs.

If you don't like the images for occasional accented chars, then  you can
specify Unicode. This has the effect of extending the range of characters
that are translated to  &#<num>;  entities.

Since Unicode covers all characters in latin2 (as well as latin1) it becomes
possible to transform one charset into another.
This is essentially what happens with   latin2,Unicode,latin2 .
This now allows characters typed as 8-bit chars to stay that way,
and allows anything in latin1 but not latin2 to be translated as an &#<num>; .

There are also a utf8 option, which allows the UTF8 output encoding to be used.
There is also an  entities  option, which uses entities like  &alpha; 
whenever possible --- unfortunately, there are few browsers that handle
these, so this output mode isn't used much.

> To conclude: for the time being I am quite satisfied as I can continue
> to use latex2html. However, the present handling of different
> character codes should not be considered as the final solution.

There is not any satisfactory alternative, so far as I can see.
It is quite wrong to assume that the output charset corresponds
to the input-encoding. Indeed, TeX lets you specify characters without using
any input encoding at all (i.e. ascii-only, with macros).

So LaTeX2HTML really does need to be told both how to interpret the input
characters, as well as what kind of output to produce.

If you have any suggests, certainly they'll be considered.

Hope this helps,

	Ross Moore

> Best regards
> Janusz
> -- 
>                      ,   
> dr hab. Janusz S. Bien, prof. UW
> Prof. Janusz S. Bien, Warsaw Uniwersity
> http://www.orient.uw.edu.pl/~jsbien/
> ---------------------------------------------------------------------
> Na tym koncie czytam i wysylam poczte i wiadomosci offline.
> On this account I read/post mail/news offline.