[l2h] Confused about Unicode support

Andreas Strotmann Andreas Strotmann <strotman@nu.cs.fsu.edu>
Wed, 30 Jun 1999 14:19:47 -0400 (EDT)


Hi,

sorry to butt in here, but to me it sounds like you're trying to do
something "wrong" here:  Unicode is compatible with (an extension of)
Latin1 *only*.  Specifying latin1,unicode sounds legal though the
"latin1" part would be superfluous.  However, Latin2 and Unicode are
decidedly *in*compatible, so specifying both is bound to produce confusion
and errors (do you interpret the 129-255 characters as Unicode (=Latin1)
or as Latin2?). 

In other words, I don't think the solution proposed below is one.  If you
specify Unicode, you automatically have Latin1, period.  Any other
characterset would be illegal, including any of the Latin2-9 or so series,
once Unicode has been specified.

Or did you want one of these (presumably, Latin2) as *input* and the other
(Unicode?) as output?

-- Andreas

On Wed, 30 Jun 1999, Mariusz Pietrzak wrote:

> Hi,
> 
> Ross MOORE wrote:
> > Hmm. It certainly works correctly if you use \L and \l
> > for the Polish L characters; so I presume that you are using
> > upper-plane (129-255) characters directly in the source, yes ?
> 
> Yes
> 
> > OK, I think I see what is causing the problem.
> > In the file  ...../versions/unicode.pl
> > there is a line near the top:
> > 
> >         require("$LATEX2HTMLVERSIONS${dd}latin1.pl");
> > 
> > Change this to read:
> > 
> > require("$LATEX2HTMLVERSIONS${dd}latin1.pl") if ($CHARSET =~/iso\-8859\-1/);
> > 
> 
> Thanks, the patch works, but ... 
> how about generating "polish" characters without using 8-bit font,
> (and without using images), by using standard commands:
> \k{a} \'c \k{e} \l{} \'n \'o \'s \'z \.z 
> \k{A} \'C \k{E} \L{} \'N \'O \'S \'Z \.Z 
> This worked with "-html_version 3.2,latin2,unicode" switch.
> Now (after the above patch) it works except \'o and \'O (l2h can't 
> convert them into available encodings - is it OK? before the patch it
> could). 
> And when using Latin2 output ("-html_version 3.2,latin2"), the
> characters generated 
> as above appear as &#<latin2_number>(only \'o and \'O appears as 
> regular characters) thus, at lest my Netscape, can't disply them
> correctly - I think that &# requires unicode number (regardless
> selected charset), and maybe in future it would be possible to
> generate 8-bit characters rather then entities.
> 
> And one more question:
> Is there a difference between \usepackage[latin2]{inputenc}
> and setting latin2 using $CHARSET and $HTML_VERSION, which
> one is a better way.
> 
> PS:In manual, page 15 - I think that there should be
> $TITLES_LANGUAGE = 'french'; rather then $LANGUAGE_TITLES = ...
> 
> Regards
> 
> Mariusz Pietrzak
> mariuszp@polbox.pl
>