[l2h] Re: LaTeX2HTML 2002 (1.67): latin2 option does not work?
Ross Moore
ross@ics.mq.edu.au
Fri, 21 Jun 2002 00:26:50 +1000 (EST)
> Thanks for quick answer.
You're welcome.
> On Thu, 20 Jun 2002 Ross Moore <ross@ics.mq.edu.au> wrote:
>
> >
> > to have latin2 as *both* the input encoding and the output encoding.
> >
>
> Isn't this rather confusing? Where is it documented?
The topic is discussed in the LaTeX Web Companion; Addison-Wesley.
The problem is that input-encoding and output-encoding can be
two quite different things.
> >
> > I suspect that your old machine had some special settings within its
> > l2hconf.pm file, which changed the input-encoding to latin2 .
> > In this case, a single 'latin2' on the command-line might set the
> > output encoding.
> > Alternatively, it may have set both encodings with l2hconf.pm
> > or within a local (or site-wide) initialisation file (.latex2html-init) .
>
> With some effort I am able to reproduce the old configuration, but I
> definitely haven't done anything special in configuration files.
>
> I think that the earlier versions (at least some of them) simply
> recognized the option of `inputenc' package. The option is now ignored
> - is it intentional?
The original HTML specs gave latin1 and Unicode as the expected
output charsets. The reality of the web is different.
With LaTeX2HTML, an input-encoding of latin1 will be assumed,
and characters not in this set are converted to images.
Furthermore, all accent-characters that are actually in latin1
are translated to &#<num>; entities.
If you don't want &#<num>; entities, then you need to say latin1
again, to specify that this is what you want.
If you specify just latin2 then this is taken to be the *input*
encoding, but the output charset is still latin1 , as per the HTML specs.
If you don't like the images for occasional accented chars, then you can
specify Unicode. This has the effect of extending the range of characters
that are translated to &#<num>; entities.
Since Unicode covers all characters in latin2 (as well as latin1) it becomes
possible to transform one charset into another.
This is essentially what happens with latin2,Unicode,latin2 .
This now allows characters typed as 8-bit chars to stay that way,
and allows anything in latin1 but not latin2 to be translated as an &#<num>; .
There are also a utf8 option, which allows the UTF8 output encoding to be used.
There is also an entities option, which uses entities like α
whenever possible --- unfortunately, there are few browsers that handle
these, so this output mode isn't used much.
> To conclude: for the time being I am quite satisfied as I can continue
> to use latex2html. However, the present handling of different
> character codes should not be considered as the final solution.
There is not any satisfactory alternative, so far as I can see.
It is quite wrong to assume that the output charset corresponds
to the input-encoding. Indeed, TeX lets you specify characters without using
any input encoding at all (i.e. ascii-only, with macros).
So LaTeX2HTML really does need to be told both how to interpret the input
characters, as well as what kind of output to produce.
If you have any suggests, certainly they'll be considered.
Hope this helps,
Ross Moore
> Best regards
>
> Janusz
>
> --
> ,
> dr hab. Janusz S. Bien, prof. UW
> Prof. Janusz S. Bien, Warsaw Uniwersity
> http://www.orient.uw.edu.pl/~jsbien/
> ---------------------------------------------------------------------
> Na tym koncie czytam i wysylam poczte i wiadomosci offline.
> On this account I read/post mail/news offline.