<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
On 07/23/2011 10:17 AM, Johannes Wilm wrote:
<blockquote
cite="mid:CABkgm-T_nOuyk=-APPZ9vaDsmdmi3hNBLrRLseh-JkuUa-u=Hg@mail.gmail.com"
type="cite">Hi,
<div><br>
</div>
<div>On the attached test file I tried to run</div>
<div><i><br>
</i></div>
<div><i>dvilualatex unicode.tex</i></div>
<div><i>dvilualatex unicode.tex</i></div>
<div><i>dvilualatex unicode.tex</i></div>
<div><i>tex4ht -f/unicode.tex -cunihtf -utf8</i></div>
<div><br>
</div>
<div>I cannot figure out as what the characters are encoded in the
output, but it doesn't seem to be utf8. Output has been
attached.</div>
</blockquote>
<br>
Can your example produce a valid dvi? In my tests, it didn't. TeX4ht
needs a valid dvi to generate html. Actually the post-processor
called tex4ht (binary) extracts the textual characters from the dvi
by making a clever substitution which is based on the *.tfm of font
used and *.htf (hypertext font). The post-processor needs *.tfm
which unfortunately is not available for unicode fonts and then it
falls back to cmr. The resulting html file will not be usable owing
to unicode characters appearing as junk.<br>
<br>
If somebody comes forward with a patch to tex4ht binary which can
post-process dvi without the help of *.tfm's will be a great
contribution. The macro package level patching is easier than the
binary level patching. Volunteers are welcome.<br>
<pre class="moz-signature" cols="72">--
Radhakrishnan
"It's today!" said Piglet.
"My favorite day," said Pooh.</pre>
</body>
</html>