[l2h] The L2H 2002 Cannot deal CJK document correctly!

Ross Moore ross@ics.mq.edu.au
Wed, 24 Apr 2002 14:04:30 +1000 (EST)


 Ross,


>   I made a very small example, so I send it to the list.
> 
> > Then I'll be able to see what is the difference, and diagnose why 
> > it happens. Give this, hopefully devising a fix will not be difficult.
> 
>   Thanks in advance.
> 
>   Didn't have usefull messages, but Chinese in the header(maybe
>   somewhere) become encoded just like,
> 
> >   ``»115ÿ§64ÿ¤......''
> > 
> > >   Yes, we need man and time.
> > 
> > Right, but I just need an example to see what is the problem.
> 
>   Sorry, I should made an example or try to find out what is
>   goning on first.

OK; I've got it, and can reproduce the problem.

The fix is easy, but first a question.
You example HTML files correctly have  charset = text/big5 .
Where is this done in the processing, or do you do it yourself
after LaTeX2HTML has finished ?

By simply inserting 2 lines into  CJK.perl  the problem
is fixed, and this charset is set automatically:


	package main;

	$charset = 'big5'; 	## insert these 2 lines
	$CHARSET = 'big5';	##

        sub pre_pre_process {
        ...
	...


This should be sufficient for documents have just Big5 characters.

Please advise if you have example documents where this is not sufficient.


The reason for the errors, without these charset settings, was because
some 8-bit characters were being translated back to TeX accents, or
to macros for mathematical symbols, according to the latin-1 use of those
characters. This is clearly inappropriate for a CJK document.


> 
> 
> Rgds,
> Edward G.J. Lee


Hope this helps,

	Ross Moore