<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#ffffff" text="#000000">

    On 07/23/2011 10:17 AM, Johannes Wilm wrote:

    <blockquote

cite="mid:CABkgm-T_nOuyk=-APPZ9vaDsmdmi3hNBLrRLseh-JkuUa-u=Hg@mail.gmail.com"

      type="cite">Hi,

      <div><br>

      </div>

      <div>On the attached test file I tried to run</div>

      <div><i><br>

        </i></div>

      <div><i>dvilualatex unicode.tex</i></div>

      <div><i>dvilualatex unicode.tex</i></div>

      <div><i>dvilualatex unicode.tex</i></div>

      <div><i>tex4ht -f/unicode.tex -cunihtf -utf8</i></div>

      <div><br>

      </div>

      <div>I cannot figure out as what the characters are encoded in the

        output, but it doesn't seem to be utf8. Output has been

        attached.</div>

    </blockquote>

    <br>

    Can your example produce a valid dvi? In my tests, it didn't. TeX4ht

    needs a valid dvi to generate html. Actually the post-processor

    called tex4ht (binary) extracts the textual characters from the dvi

    by making a clever substitution which is based on the *.tfm of font

    used and *.htf (hypertext font). The post-processor needs *.tfm

    which unfortunately is not available for unicode fonts and then it

    falls back to cmr. The resulting html file will not be usable owing

    to unicode characters appearing as junk.<br>

    <br>

    If somebody comes forward with a patch to tex4ht binary which can

    post-process dvi without the help of *.tfm's will be a great

    contribution. The macro package level patching is easier than the

    binary level patching. Volunteers are welcome.<br>

    <pre class="moz-signature" cols="72">-- 

Radhakrishnan

"It's today!" said Piglet.

"My favorite day," said Pooh.</pre>

  </body>

</html>